date:20140813

Re: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support

2014-08-13 Thread Christoffer Dall

On Tue, Aug 12, 2014 at 06:27:11PM -0700, Mario Smarduch wrote:
 On 08/12/2014 02:50 AM, Christoffer Dall wrote:
  On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
  On 08/11/2014 12:13 PM, Christoffer Dall wrote:
  On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:

[...]

  @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, 
  gpa_t gpa, void *data)
   {
   pte_t *pte = (pte_t *)data;
   
  -stage2_set_pte(kvm, NULL, gpa, pte, false);
  +stage2_set_pte(kvm, NULL, gpa, pte, false, false);
 
  why is logging never active if we are called from MMU notifiers?
 
  mmu notifiers update sptes, but I don't see how these updates
  can result in guest dirty pages. Also guest pages are marked dirty
  from 2nd stage page fault handlers (searching through the code).
 
  Ok, then add:
  
  /*
   * We can always call stage2_set_pte with logging_active == false,
   * because MMU notifiers will have unmapped a huge PMD before calling
   * -change_pte() (which in turn calls kvm_set_spte_hva()) and therefore
   * stage2_set_pte() never needs to clear out a huge PMD through this
   * calling path.
   */
 
 So here on permission change to primary pte's kernel first invalidates
 related s2ptes followed by -change_pte calls to synchronize s2ptes. As
 consequence of invalidation we unmap huge PMDs, if a page falls in that
 range.
 
 Is the comment to point out use of logging flag under various scenarios?

The comment is because when you look at this function it is not obvious
why we pass logging_active=false, despite logging may actually be
active.  This could suggest that the parameter to stage2_set_pte()
should be named differently (break_huge_pmds) or something like that,
but we can also be satisfied with the comment.

 
 Should I add comments on flag use in other places as well?
 

It's always a judgement call.  I didn't find it necessarry to put a
comment elsewhere because I think it's pretty obivous that we would
never care about logging writes to device regions.

However, this made me think, are we making sure that we are not marking
device mappings as read-only in the wp_range functions?  I think it's
quite bad if we mark the VCPU interface as read-only for example.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread H. Peter Anvin

On 08/12/2014 12:22 PM, Andy Lutomirski wrote:
 On Tue, Aug 12, 2014 at 12:17 PM, Theodore Ts'o ty...@mit.edu wrote:
 On Tue, Aug 12, 2014 at 12:11:29PM -0700, Andy Lutomirski wrote:

 What's the status of this series?  I assume that it's too late for at
 least patches 2-5 to make it into 3.17.

 Which tree were you hoping this patch series to go through?  I was
 assuming it would go through the x86 tree since the bulk of the
 changes in the x86 subsystem (hence my Acked-by).
 
 There's some argument that patch 1 should go through the kvm tree.
 There's no real need for patch 1 and 2-5 to end up in the same kernel
 release, either.
 

 IIRC, Peter had some concerns, and I don't remember if they were all
 addressed.  Peter?

 
 I don't know.  I rewrite one thing he didn't like and undid the other,
 but there's plenty of opportunity for this version to be problematic, too.
 

Sorry, I have been heads down on the current merge window.  I will look
at this for 3.18, presumably after Kernel Summit.

The proposed arch_get_rng_seed() is not really what it claims to be; it
most definitely does not produce seed-grade randomness, instead it seems
to be an arch function for best-effort initialization of the entropy
pools -- which is fine, it is just something quite different.

I want to look over it more carefully before acking it, though.

Andy, are you going to be in Chicago?

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread Andy Lutomirski

On Aug 13, 2014 12:48 AM, H. Peter Anvin h...@zytor.com wrote:

 On 08/12/2014 12:22 PM, Andy Lutomirski wrote:
  On Tue, Aug 12, 2014 at 12:17 PM, Theodore Ts'o ty...@mit.edu wrote:
  On Tue, Aug 12, 2014 at 12:11:29PM -0700, Andy Lutomirski wrote:
 
  What's the status of this series?  I assume that it's too late for at
  least patches 2-5 to make it into 3.17.
 
  Which tree were you hoping this patch series to go through?  I was
  assuming it would go through the x86 tree since the bulk of the
  changes in the x86 subsystem (hence my Acked-by).
 
  There's some argument that patch 1 should go through the kvm tree.
  There's no real need for patch 1 and 2-5 to end up in the same kernel
  release, either.
 
 
  IIRC, Peter had some concerns, and I don't remember if they were all
  addressed.  Peter?
 
 
  I don't know.  I rewrite one thing he didn't like and undid the other,
  but there's plenty of opportunity for this version to be problematic, too.
 

 Sorry, I have been heads down on the current merge window.  I will look
 at this for 3.18, presumably after Kernel Summit.

 The proposed arch_get_rng_seed() is not really what it claims to be; it
 most definitely does not produce seed-grade randomness, instead it seems
 to be an arch function for best-effort initialization of the entropy
 pools -- which is fine, it is just something quite different.

Fair enough.  I meant seed as in something that initialized a PRNG
(think srand), not seed as in a
promised-to-be-cryptographically-secure seed for a DRBG.

I can rename it, update the comment, or otherwise tweak it to make the
intent clearer.


 I want to look over it more carefully before acking it, though.

It would also be nice for someone with a Haswell box (and an RDSEED
box) to test it.  I have neither.


 Andy, are you going to be in Chicago?

Yes.


 -hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: The status about vhost-net on kvm-arm?

2014-08-13 Thread Nikolay Nikolaev

On Tue, Aug 12, 2014 at 6:47 PM, Nikolay Nikolaev
n.nikol...@virtualopensystems.com wrote:

 Hello,


 On Tue, Aug 12, 2014 at 5:41 AM, Li Liu john.li...@huawei.com wrote:
 
  Hi all,
 
  Is anyone there can tell the current status of vhost-net on kvm-arm?
 
  Half a year has passed from Isa Ansharullah asked this question:
  http://www.spinics.net/lists/kvm-arm/msg08152.html
 
  I have found two patches which have provided the kvm-arm support of
  eventfd and irqfd:
 
  1) [RFC PATCH 0/4] ARM: KVM: Enable the ioeventfd capability of KVM on ARM
  http://lists.gnu.org/archive/html/qemu-devel/2014-01/msg01770.html
 
  2) [RFC,v3] ARM: KVM: add irqfd and irq routing support
  https://patches.linaro.org/32261/
 
  And there's a rough patch for qemu to support eventfd from Ying-Shiuan Pan:
 
  [Qemu-devel] [PATCH 0/4] ioeventfd support for virtio-mmio
  https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html
 
  But there no any comments of this patch. And I can found nothing about qemu
  to support irqfd. Do I lost the track?
 
  If nobody try to fix it. We have a plan to complete it about virtio-mmio
  supporing irqfd and multiqueue.
 
 

 we at Virtual Open Systems did some work and tested vhost-net on ARM
 back in March.
 The setup was based on:
  - host kernel with our ioeventfd patches:
 http://www.spinics.net/lists/kvm-arm/msg08413.html

 - qemu with the aforementioned patches from Ying-Shiuan Pan
 https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html

 The testbed was ARM Chromebook with Exynos 5250, using a 1Gbps USB3
 Ethernet adapter connected to a 1Gbps switch. I can't find the actual
 numbers but I remember that with multiple streams the gain was clearly
 seen. Note that it used the minimum required ioventfd implementation
 and not irqfd.

 I guess it is feasible to think that it all can be put together and
 rebased + the recent irqfd work. One can achiev even better
 performance (because of the irqfd).


Managed to replicate the setup with the old versions e used in March:

Single stream from another machine to chromebook with 1Gbps USB3
Ethernet adapter.
iperf -c address -P 1 -i 1 -p 5001 -f k -t 10
to HOST: 858316 Kbits/sec
to GUEST: 761563 Kbits/sec

10 parallel streams
iperf -c address -P 10 -i 1 -p 5001 -f k -t 10
to HOST: 842420 Kbits/sec
to GUEST: 625144 Kbits/sec


 
 
 
 
 
  ___
  kvmarm mailing list
  kvm...@lists.cs.columbia.edu
  https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


 regards,
 Nikolay Nikolaev
 Virtual Open Systems
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-08-13 Thread Bharat Bhushan

This patch emulates debug registers and debug exception
to support guest using debug resource. This enables running
gdb/kgdb etc in guest.

On BOOKE architecture we cannot share debug resources between QEMU and
guest because:
When QEMU is using debug resources then debug exception must
be always enabled. To achieve this we set MSR_DE and also set
MSRP_DEP so guest cannot change MSR_DE.

When emulating debug resource for guest we want guest
to control MSR_DE (enable/disable debug interrupt on need).

So above mentioned two configuration cannot be supported
at the same time. So the result is that we cannot share
debug resources between QEMU and Guest on BOOKE architecture.

In the current design QEMU gets priority over guest, this means that if
QEMU is using debug resources then guest cannot use them and if guest is
using debug resource then QEMU can overwrite them.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
v3-v4
 - Clear only MRR on vcpu init

 arch/powerpc/include/asm/kvm_ppc.h   |   3 +
 arch/powerpc/include/asm/reg_booke.h |   2 +
 arch/powerpc/kvm/booke.c |  42 +-
 arch/powerpc/kvm/booke_emulate.c | 148 +++
 4 files changed, 194 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index fb86a22..05e58b6 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -206,6 +206,9 @@ extern int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, 
u32 *server,
 extern int kvmppc_xics_int_on(struct kvm *kvm, u32 irq);
 extern int kvmppc_xics_int_off(struct kvm *kvm, u32 irq);
 
+void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu);
+void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu);
+
 union kvmppc_one_reg {
u32 wval;
u64 dval;
diff --git a/arch/powerpc/include/asm/reg_booke.h 
b/arch/powerpc/include/asm/reg_booke.h
index 464f108..150d485 100644
--- a/arch/powerpc/include/asm/reg_booke.h
+++ b/arch/powerpc/include/asm/reg_booke.h
@@ -307,6 +307,8 @@
  * DBSR bits which have conflicting definitions on true Book E versus IBM 40x.
  */
 #ifdef CONFIG_BOOKE
+#define DBSR_IDE   0x8000  /* Imprecise Debug Event */
+#define DBSR_MRR   0x3000  /* Most Recent Reset */
 #define DBSR_IC0x0800  /* Instruction Completion */
 #define DBSR_BT0x0400  /* Branch Taken */
 #define DBSR_IRPT  0x0200  /* Exception Debug Event */
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 074b7fc..6901862 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -267,6 +267,16 @@ static void kvmppc_core_dequeue_watchdog(struct kvm_vcpu 
*vcpu)
clear_bit(BOOKE_IRQPRIO_WATCHDOG, vcpu-arch.pending_exceptions);
 }
 
+void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu)
+{
+   kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DEBUG);
+}
+
+void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu)
+{
+   clear_bit(BOOKE_IRQPRIO_DEBUG, vcpu-arch.pending_exceptions);
+}
+
 static void set_guest_srr(struct kvm_vcpu *vcpu, unsigned long srr0, u32 srr1)
 {
kvmppc_set_srr0(vcpu, srr0);
@@ -735,7 +745,32 @@ static int kvmppc_handle_debug(struct kvm_run *run, struct 
kvm_vcpu *vcpu)
struct debug_reg *dbg_reg = (vcpu-arch.dbg_reg);
u32 dbsr = vcpu-arch.dbsr;
 
-   /* Clear guest dbsr (vcpu-arch.dbsr) */
+   if (vcpu-guest_debug == 0) {
+   /*
+* Debug resources belong to Guest.
+* Imprecise debug event is not injected
+*/
+   if (dbsr  DBSR_IDE) {
+   dbsr = ~DBSR_IDE;
+   if (!dbsr)
+   return RESUME_GUEST;
+   }
+
+   if (dbsr  (vcpu-arch.shared-msr  MSR_DE) 
+   (vcpu-arch.dbg_reg.dbcr0  DBCR0_IDM))
+   kvmppc_core_queue_debug(vcpu);
+
+   /* Inject a program interrupt if trap debug is not allowed */
+   if ((dbsr  DBSR_TIE)  !(vcpu-arch.shared-msr  MSR_DE))
+   kvmppc_core_queue_program(vcpu, ESR_PTR);
+
+   return RESUME_GUEST;
+   }
+
+   /*
+* Debug resource owned by userspace.
+* Clear guest dbsr (vcpu-arch.dbsr)
+*/
vcpu-arch.dbsr = 0;
run-debug.arch.status = 0;
run-debug.arch.address = vcpu-arch.pc;
@@ -1249,6 +1284,11 @@ int kvmppc_subarch_vcpu_init(struct kvm_vcpu *vcpu)
setup_timer(vcpu-arch.wdt_timer, kvmppc_watchdog_func,
(unsigned long)vcpu);
 
+   /*
+* Clear DBSR.MRR to avoid guest debug interrupt as
+* this is of host interest
+*/
+   mtspr(SPRN_DBSR, DBSR_MRR);
return 0;
 }
 
diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c
index

[PATCH] KVM: PPc: BOOKE: Add one_reg documentation of SPRG9 and DBSR

2014-08-13 Thread Bharat Bhushan

This was missed in respective one_reg implementation patch.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 Documentation/virtual/kvm/api.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index a21ff22..9177f23 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1878,6 +1878,8 @@ registers, find a list below:
   PPC   | KVM_REG_PPC_ARCH_COMPAT 32
   PPC   | KVM_REG_PPC_DABRX | 32
   PPC   | KVM_REG_PPC_WORT  | 64
+  PPC  | KVM_REG_PPC_SPRG9 | 64
+  PPC  | KVM_REG_PPC_DBSR  | 32
   PPC   | KVM_REG_PPC_TM_GPR0  | 64
   ...
   PPC   | KVM_REG_PPC_TM_GPR31 | 64
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] KVM: x86: check ISR and TMR to construct eoi exit bitmap

2014-08-13 Thread Wei Wang

From: Yang Zhang yang.z.zh...@intel.com

Guest may mask the IOAPIC entry before issue EOI. In such case,
EOI will not be intercepted by hypervisor due to the corrensponding
bit in eoi exit bitmap is not setting.

The solution is to check ISR + TMR to construct the EOI exit bitmap.

This patch is a better fixing for the issue that commit 0f6c0a740b
tries to solve.

Tested-by: Alex Williamson alex.william...@redhat.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
Signed-off-by: Wei Wang wei.w.w...@intel.com
---
 arch/x86/kvm/lapic.c |   17 +
 arch/x86/kvm/lapic.h |2 ++
 arch/x86/kvm/x86.c   |9 +
 virt/kvm/ioapic.c|7 ---
 4 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 08e8a89..0ed4bcb 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -515,6 +515,23 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
__clear_bit(KVM_APIC_PV_EOI_PENDING, vcpu-arch.apic_attention);
 }
 
+void kvm_apic_zap_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
+   u32 *tmr)
+{
+   u32 i, reg_off, intr_in_service;
+   struct kvm_lapic *apic = vcpu-arch.apic;
+
+   for (i = 0; i  8; i++) {
+   reg_off = 0x10 * i;
+   intr_in_service = apic_read_reg(apic, APIC_ISR + reg_off) 
+   kvm_apic_get_reg(apic, APIC_TMR + reg_off);
+   if (intr_in_service) {
+   *((u32 *)eoi_exit_bitmap + i) |= intr_in_service;
+   tmr[i] |= intr_in_service;
+   }
+   }
+}
+
 void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr)
 {
struct kvm_lapic *apic = vcpu-arch.apic;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 6a11845..4ee3d70 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -53,6 +53,8 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
 u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
 void kvm_apic_set_version(struct kvm_vcpu *vcpu);
 
+void kvm_apic_zap_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
+   u32 *tmr);
 void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr);
 void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
 int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 204422d..755b556 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6005,6 +6005,15 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
memset(tmr, 0, 32);
 
kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr);
+   /*
+* Guest may mask the IOAPIC entry before issue EOI. In such case,
+* EOI will not be intercepted by hypervisor due to the corrensponding
+* bit in eoi exit bitmap is not setting.
+*
+* The solution is to check ISR + TMR to construct the EOI exit bitmap.
+*/
+   kvm_apic_zap_eoi_exitmap(vcpu, eoi_exit_bitmap, tmr);
+
kvm_x86_ops-load_eoi_exitmap(vcpu, eoi_exit_bitmap);
kvm_apic_update_tmr(vcpu, tmr);
 }
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index e8ce34c..2458a1d 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -254,9 +254,10 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 
*eoi_exit_bitmap,
spin_lock(ioapic-lock);
for (index = 0; index  IOAPIC_NUM_PINS; index++) {
e = ioapic-redirtbl[index];
-   if (e-fields.trig_mode == IOAPIC_LEVEL_TRIG ||
-   kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, 
index) ||
-   index == RTC_GSI) {
+   if (!e-fields.mask 
+   (e-fields.trig_mode == IOAPIC_LEVEL_TRIG ||
+kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC,
+index) || index == RTC_GSI)) {
if (kvm_apic_match_dest(vcpu, NULL, 0,
e-fields.dest_id, e-fields.dest_mode)) {
__set_bit(e-fields.vector,
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/4] VFIO: PLATFORM: Add device tree info API and skeleton

2014-08-13 Thread Antonios Motakis

This patch introduced the API to return device tree info about
a PLATFORM device (if described by a device tree) and the skeleton
of the implementation for VFIO_PLATFORM. Information about any device
node bound by VFIO_PLATFORM should be queried via the introduced ioctl
VFIO_DEVICE_GET_DEVTREE_INFO.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/platform/Makefile|  2 +-
 drivers/vfio/platform/devtree.c   | 27 ++
 drivers/vfio/platform/vfio_platform.c | 11 +
 drivers/vfio/platform/vfio_platform_private.h |  7 ++
 include/uapi/linux/vfio.h | 32 ---
 5 files changed, 75 insertions(+), 4 deletions(-)
 create mode 100644 drivers/vfio/platform/devtree.c

diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile
index 2c53327..4313fd7 100644
--- a/drivers/vfio/platform/Makefile
+++ b/drivers/vfio/platform/Makefile
@@ -1,4 +1,4 @@
 
-vfio-platform-y := vfio_platform.o vfio_platform_irq.o
+vfio-platform-y := vfio_platform.o vfio_platform_irq.o devtree.o
 
 obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
diff --git a/drivers/vfio/platform/devtree.c b/drivers/vfio/platform/devtree.c
new file mode 100644
index 000..91cab88
--- /dev/null
+++ b/drivers/vfio/platform/devtree.c
@@ -0,0 +1,27 @@
+#include linux/slab.h
+#include linux/vfio.h
+#include linux/of.h
+#include linux/platform_device.h
+#include vfio_platform_private.h
+
+void vfio_platform_devtree_get(struct vfio_platform_device *vdev)
+{
+   vdev-of_node = of_node_get(vdev-pdev-dev.of_node);
+}
+
+void vfio_platform_devtree_put(struct vfio_platform_device *vdev)
+{
+   of_node_put(vdev-of_node);
+   vdev-of_node = NULL;
+}
+
+bool vfio_platform_has_devtree(struct vfio_platform_device *vdev)
+{
+   return !!vdev-of_node;
+}
+
+long vfio_platform_devtree_ioctl(struct vfio_platform_device *vdev,
+unsigned long arg)
+{
+   return -EINVAL; /* not implemented yet */
+}
diff --git a/drivers/vfio/platform/vfio_platform.c 
b/drivers/vfio/platform/vfio_platform.c
index f4c06c6..e6fe05a 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -26,6 +26,7 @@
 #include linux/vfio.h
 #include linux/io.h
 #include linux/platform_device.h
+#include linux/of.h
 #include linux/irq.h
 
 #include vfio_platform_private.h
@@ -66,6 +67,9 @@ static int vfio_platform_regions_init(struct 
vfio_platform_device *vdev)
 
vdev-num_regions = cnt;
 
+   /* get device tree node for info if available */
+   vfio_platform_devtree_get(vdev);
+
return 0;
 err:
kfree(vdev-region);
@@ -74,6 +78,7 @@ err:
 
 static void vfio_platform_regions_cleanup(struct vfio_platform_device *vdev)
 {
+   vfio_platform_devtree_put(vdev);
vdev-num_regions = 0;
kfree(vdev-region);
 }
@@ -132,6 +137,9 @@ static long vfio_platform_ioctl(void *device_data,
return -EINVAL;
 
info.flags = VFIO_DEVICE_FLAGS_PLATFORM;
+   if (vfio_platform_has_devtree(vdev))
+   info.flags |= VFIO_DEVICE_FLAGS_DEVTREE;
+
info.num_regions = vdev-num_regions;
info.num_irqs = vdev-num_irqs;
 
@@ -210,6 +218,9 @@ static long vfio_platform_ioctl(void *device_data,
 
return ret;
 
+   } else if (cmd == VFIO_DEVICE_GET_DEVTREE_INFO) {
+   return vfio_platform_devtree_ioctl(vdev, arg);
+
} else if (cmd == VFIO_DEVICE_RESET)
return -EINVAL;
 
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 86a9201..1c42ba0 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -49,6 +49,7 @@ struct vfio_platform_device {
u32 num_regions;
struct vfio_platform_irq*irq;
u32 num_irqs;
+   struct device_node  *of_node;
 };
 
 extern int vfio_platform_irq_init(struct vfio_platform_device *vdev);
@@ -59,4 +60,10 @@ extern int vfio_platform_set_irqs_ioctl(struct 
vfio_platform_device *vdev,
uint32_t flags, unsigned index, unsigned start,
unsigned count, void *data);
 
+/* device tree info support in devtree.c */
+extern void vfio_platform_devtree_get(struct vfio_platform_device *vdev);
+extern void vfio_platform_devtree_put(struct vfio_platform_device *vdev);
+extern bool vfio_platform_has_devtree(struct vfio_platform_device *vdev);
+extern long vfio_platform_devtree_ioctl(struct vfio_platform_device *vdev,
+   unsigned long arg);
 #endif /* VFIO_PLATFORM_PRIVATE_H */
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index d381107..60f66ec 100644
--- a/include/uapi/linux/vfio.h
+++

[RFC 0/4] VFIO: PLATFORM: Return device tree info for a platform device node

2014-08-13 Thread Antonios Motakis

This RFC's intention is to show what an interface to access device node
properties for VFIO_PLATFORM can look like.

If a device tree node corresponding to a platform device bound by VFIO_PLATFORM
is available, this patch series will allow the user to query the properties
associated with this device node. This can be useful for userspace drivers
to automatically query parameters related to the device.

An API to return data from a device's device tree has been proposed before on
these lists. The API proposed here is slightly different.

Properties to parse from the device tree are not indexed by a numerical id.
The host system doesn't guarantee any specific ordering for the available
properties, or that those will remain the same; while this does not happen in
practice, there is nothing from the host changing the device nodes during
operation. So properties are accessed by property name.

The type of the property accessed must also be known by the user. Properties
types implemented in this RFC:
- VFIO_DEVTREE_ARR_TYPE_STRING (strings sepparated by the null character)
- VFIO_DEVTREE_ARR_TYPE_U32
- VFIO_DEVTREE_ARR_TYPE_U16
- VFIO_DEVTREE_ARR_TYPE_U8

These can all be access by the ioctl VFIO_DEVICE_GET_DEVTREE_INFO. A new ioctl
was preferred instead of shoehorning the functionality in VFIO_DEVICE_GET_INFO.
The structure exchanged looks like this:

/**
 * VFIO_DEVICE_GET_DEVTREE_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 16,
 *  struct vfio_devtree_info)
 *
 * Retrieve information from the device's device tree, if available.
 * Caller will initialize data[] with a single string with the requested
 * devicetree property name, and type depending on whether a array of strings
 * or an array of u32 values is expected. On success, data[] will be extended
 * with the requested information, either as an array of u32, or with a list
 * of strings sepparated by the NULL terminating character.
 * Return: 0 on success, -errno on failure.
 */
struct vfio_devtree_info {
__u32   argsz;
__u32   type;
#define VFIO_DEVTREE_PROP_NAMES 0
#define VFIO_DEVTREE_ARR_TYPE_STRING1
#define VFIO_DEVTREE_ARR_TYPE_U82
#define VFIO_DEVTREE_ARR_TYPE_U16   3
#define VFIO_DEVTREE_ARR_TYPE_U32   4
__u32   length;
__u8data[];
};
#define VFIO_DEVICE_GET_DEVTREE_INFO_IO(VFIO_TYPE, VFIO_BASE + 17)

The length of the property will be reported in length, so the user can 
reallocate
the structure if the data does not fit the first time the call is used.

Specifically for QEMU, reading the compatible property of the device tree
node coul be of use to find out what device is being assigned to the guest and
handle appropriately a wider range of devices in the future, and to generate an
appropriate device tree for the guest.

Antonios Motakis (4):
  VFIO: PLATFORM: Add device tree info API and skeleton
  VFIO: PLATFORM: DEVTREE: Return available property names
  VFIO: PLATFORM: DEVTREE: Access property as a list of strings
  VFIO: PLATFORM: DEVTREE: Return arrays of u32, u16, or u8

 drivers/vfio/platform/Makefile|   2 +-
 drivers/vfio/platform/devtree.c   | 252 ++
 drivers/vfio/platform/vfio_platform.c |  11 ++
 drivers/vfio/platform/vfio_platform_private.h |   7 +
 include/uapi/linux/vfio.h |  32 +++-
 5 files changed, 300 insertions(+), 4 deletions(-)
 create mode 100644 drivers/vfio/platform/devtree.c

-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] VFIO: PLATFORM: DEVTREE: Return arrays of u32, u16, or u8

2014-08-13 Thread Antonios Motakis

Certain properties of a device tree node are accessible as an array
of unsigned integers, either u32, u16, or u8. Let the VFIO user query
this type of device node properties.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/platform/devtree.c | 99 +
 1 file changed, 99 insertions(+)

diff --git a/drivers/vfio/platform/devtree.c b/drivers/vfio/platform/devtree.c
index 80c60d4..331cc34 100644
--- a/drivers/vfio/platform/devtree.c
+++ b/drivers/vfio/platform/devtree.c
@@ -98,6 +98,96 @@ static int devtree_get_full_name(struct device_node *np, 
void __user *datap,
return 0;
 }
 
+static int devtree_get_u32_arr(const struct device_node *np, const char *name,
+  void __user *datap, unsigned long datasz)
+{
+   int ret;
+   int n;
+   u32 *out;
+
+   n = of_property_count_elems_of_size(np, name, sizeof(u32));
+   if (n  0)
+   return n;
+
+   if (n * sizeof(u32)  datasz)
+   return -EAGAIN;
+
+   out = kcalloc(n, sizeof(u32), GFP_KERNEL);
+   if (!out)
+   return -EFAULT;
+
+   ret = of_property_read_u32_array(np, name, out, n);
+   if (ret)
+   goto out;
+
+   if (copy_to_user(datap, out, n * sizeof(u32)))
+   ret = -EFAULT;
+
+out:
+   kfree(out);
+   return ret;
+}
+
+static int devtree_get_u16_arr(const struct device_node *np, const char *name,
+  void __user *datap, unsigned long datasz)
+{
+   int ret;
+   int n;
+   u16 *out;
+
+   n = of_property_count_elems_of_size(np, name, sizeof(u16));
+   if (n  0)
+   return n;
+
+   if (n * sizeof(u16)  datasz)
+   return -EAGAIN;
+
+   out = kcalloc(n, sizeof(u16), GFP_KERNEL);
+   if (!out)
+   return -EFAULT;
+
+   ret = of_property_read_u16_array(np, name, out, n);
+   if (ret)
+   goto out;
+
+   if (copy_to_user(datap, out, n * sizeof(u16)))
+   ret = -EFAULT;
+
+out:
+   kfree(out);
+   return ret;
+}
+
+static int devtree_get_u8_arr(const struct device_node *np, const char *name,
+  void __user *datap, unsigned long datasz)
+{
+   int ret;
+   int n;
+   u8 *out;
+
+   n = of_property_count_elems_of_size(np, name, sizeof(u8));
+   if (n  0)
+   return n;
+
+   if (n * sizeof(u8)  datasz)
+   return -EAGAIN;
+
+   out = kcalloc(n, sizeof(u8), GFP_KERNEL);
+   if (!out)
+   return -EFAULT;
+
+   ret = of_property_read_u8_array(np, name, out, n);
+   if (ret)
+   goto out;
+
+   if (copy_to_user(datap, out, n * sizeof(u8)))
+   ret = -EFAULT;
+
+out:
+   kfree(out);
+   return ret;
+}
+
 long vfio_platform_devtree_ioctl(struct vfio_platform_device *vdev,
 unsigned long arg)
 {
@@ -143,6 +233,15 @@ long vfio_platform_devtree_ioctl(struct 
vfio_platform_device *vdev,
} else if (info.type == VFIO_DEVTREE_ARR_TYPE_STRING)
ret = devtree_get_strings(vdev-of_node, name, datap, datasz);
 
+   else if (info.type == VFIO_DEVTREE_ARR_TYPE_U32)
+   ret = devtree_get_u32_arr(vdev-of_node, name, datap, datasz);
+
+   else if (info.type == VFIO_DEVTREE_ARR_TYPE_U16)
+   ret = devtree_get_u16_arr(vdev-of_node, name, datap, datasz);
+
+   else if (info.type == VFIO_DEVTREE_ARR_TYPE_U8)
+   ret = devtree_get_u8_arr(vdev-of_node, name, datap, datasz);
+
kfree(name);
 
 out:
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] VFIO: PLATFORM: DEVTREE: Access property as a list of strings

2014-08-13 Thread Antonios Motakis

Certain device tree properties (e.g. the device node name, the compatible
string), are available as a list of strings (separated by the null
terminating character). Let the VFIO user query this type of properties.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/platform/devtree.c | 60 +
 1 file changed, 60 insertions(+)

diff --git a/drivers/vfio/platform/devtree.c b/drivers/vfio/platform/devtree.c
index b8fd4138..80c60d4 100644
--- a/drivers/vfio/platform/devtree.c
+++ b/drivers/vfio/platform/devtree.c
@@ -61,6 +61,43 @@ static int devtree_get_prop_names(struct device_node *np, 
void __user *datap,
return ret;
 }
 
+static int devtree_get_strings(struct device_node *np, char *name,
+  void __user *datap, unsigned long datasz)
+{
+   struct property *prop;
+   int len;
+
+   prop = of_find_property(np, name, len);
+
+   if (!prop)
+   return -EINVAL;
+
+   if (len  datasz)
+   return -EAGAIN;
+
+   if (copy_to_user(datap, prop-value, len))
+   return -EFAULT;
+   else
+   return 0;
+}
+
+static int devtree_get_full_name(struct device_node *np, void __user *datap,
+unsigned long datasz, int *lenp)
+{
+   int len = strlen(np-full_name) + 1;
+
+   if (lenp)
+   *lenp = len;
+
+   if (len  datasz)
+   return -EAGAIN;
+
+   if (copy_to_user(datap, np-full_name, len))
+   return -EFAULT;
+
+   return 0;
+}
+
 long vfio_platform_devtree_ioctl(struct vfio_platform_device *vdev,
 unsigned long arg)
 {
@@ -68,6 +105,7 @@ long vfio_platform_devtree_ioctl(struct vfio_platform_device 
*vdev,
unsigned long minsz = offsetofend(struct vfio_devtree_info, length);
void __user *datap = (void __user *) arg + minsz;
unsigned long int datasz;
+   char *name;
int ret = -EINVAL;
 
if (!vfio_platform_has_devtree(vdev))
@@ -84,8 +122,30 @@ long vfio_platform_devtree_ioctl(struct 
vfio_platform_device *vdev,
if (info.type == VFIO_DEVTREE_PROP_NAMES) {
ret = devtree_get_prop_names(vdev-of_node, datap, datasz,
info.length);
+   goto out;
}
 
+   name = kzalloc(datasz, GFP_KERNEL);
+   if (!name)
+   return -ENOMEM;
+   if (copy_from_user(name, datap, datasz))
+   return -EFAULT;
+
+   if (!of_find_property(vdev-of_node, name, info.length)) {
+   /* special case full_name as a property that is not on the fdt,
+* but we wish to return to the user as it includes the full
+* path of the device */
+   if (!strcmp(name, full_name) 
+   (info.type == VFIO_DEVTREE_ARR_TYPE_STRING))
+   ret = devtree_get_full_name(vdev-of_node, datap,
+   datasz, info.length);
+
+   } else if (info.type == VFIO_DEVTREE_ARR_TYPE_STRING)
+   ret = devtree_get_strings(vdev-of_node, name, datap, datasz);
+
+   kfree(name);
+
+out:
if (copy_to_user((void __user *)arg, info, minsz))
ret = -EFAULT;
 
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] VFIO: PLATFORM: DEVTREE: Return available property names

2014-08-13 Thread Antonios Motakis

For various reasons, the available properties of the platform device
node in the device tree node should be referred to by the property name.
Passing type = VFIO_DEVTREE_PROP_NAMES to VFIO_DEVICE_GET_DEVTREE_INFO,
returns a list of strings with the available properties that the VFIO
user can access.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/platform/devtree.c | 68 -
 1 file changed, 67 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/platform/devtree.c b/drivers/vfio/platform/devtree.c
index 91cab88..b8fd4138 100644
--- a/drivers/vfio/platform/devtree.c
+++ b/drivers/vfio/platform/devtree.c
@@ -20,8 +20,74 @@ bool vfio_platform_has_devtree(struct vfio_platform_device 
*vdev)
return !!vdev-of_node;
 }
 
+static int devtree_get_prop_names(struct device_node *np, void __user *datap,
+ unsigned long datasz, int *lenp)
+{
+   struct property *prop;
+   int len = 0, sz;
+   int ret = 0;
+
+   for_each_property_of_node(np, prop) {
+   sz = strlen(prop-name) + 1;
+
+   if (datasz  sz) {
+   ret = -EAGAIN;
+   break;
+   }
+
+   if (copy_to_user(datap, prop-name, sz))
+   return -EFAULT;
+
+   datap += sz;
+   datasz -= sz;
+   len += sz;
+   }
+
+   /* if overflow occurs, calculate remaining length */
+   while (prop) {
+   len += strlen(prop-name) + 1;
+   prop = prop-next;
+   }
+
+   /* we expose the full_name in addition to the usual properties */
+   len += sz = strlen(full_name) + 1;
+   if (datasz  sz) {
+   ret = -EAGAIN;
+   } else if (copy_to_user(datap, full_name, sz))
+   return -EFAULT;
+
+   *lenp = len;
+
+   return ret;
+}
+
 long vfio_platform_devtree_ioctl(struct vfio_platform_device *vdev,
 unsigned long arg)
 {
-   return -EINVAL; /* not implemented yet */
+   struct vfio_devtree_info info;
+   unsigned long minsz = offsetofend(struct vfio_devtree_info, length);
+   void __user *datap = (void __user *) arg + minsz;
+   unsigned long int datasz;
+   int ret = -EINVAL;
+
+   if (!vfio_platform_has_devtree(vdev))
+   return -EINVAL;
+
+   if (copy_from_user(info, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (info.argsz  minsz)
+   return -EINVAL;
+
+   datasz = info.argsz - minsz;
+
+   if (info.type == VFIO_DEVTREE_PROP_NAMES) {
+   ret = devtree_get_prop_names(vdev-of_node, datap, datasz,
+   info.length);
+   }
+
+   if (copy_to_user((void __user *)arg, info, minsz))
+   ret = -EFAULT;
+
+   return ret;
 }
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: The status about vhost-net on kvm-arm?

2014-08-13 Thread Li Liu



On 2014/8/13 17:10, Nikolay Nikolaev wrote:
 On Tue, Aug 12, 2014 at 6:47 PM, Nikolay Nikolaev
 n.nikol...@virtualopensystems.com wrote:

 Hello,


 On Tue, Aug 12, 2014 at 5:41 AM, Li Liu john.li...@huawei.com wrote:

 Hi all,

 Is anyone there can tell the current status of vhost-net on kvm-arm?

 Half a year has passed from Isa Ansharullah asked this question:
 http://www.spinics.net/lists/kvm-arm/msg08152.html

 I have found two patches which have provided the kvm-arm support of
 eventfd and irqfd:

 1) [RFC PATCH 0/4] ARM: KVM: Enable the ioeventfd capability of KVM on ARM
 http://lists.gnu.org/archive/html/qemu-devel/2014-01/msg01770.html

 2) [RFC,v3] ARM: KVM: add irqfd and irq routing support
 https://patches.linaro.org/32261/

 And there's a rough patch for qemu to support eventfd from Ying-Shiuan Pan:

 [Qemu-devel] [PATCH 0/4] ioeventfd support for virtio-mmio
 https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html

 But there no any comments of this patch. And I can found nothing about qemu
 to support irqfd. Do I lost the track?

 If nobody try to fix it. We have a plan to complete it about virtio-mmio
 supporing irqfd and multiqueue.



 we at Virtual Open Systems did some work and tested vhost-net on ARM
 back in March.
 The setup was based on:
  - host kernel with our ioeventfd patches:
 http://www.spinics.net/lists/kvm-arm/msg08413.html

 - qemu with the aforementioned patches from Ying-Shiuan Pan
 https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html

 The testbed was ARM Chromebook with Exynos 5250, using a 1Gbps USB3
 Ethernet adapter connected to a 1Gbps switch. I can't find the actual
 numbers but I remember that with multiple streams the gain was clearly
 seen. Note that it used the minimum required ioventfd implementation
 and not irqfd.

 I guess it is feasible to think that it all can be put together and
 rebased + the recent irqfd work. One can achiev even better
 performance (because of the irqfd).

 
 Managed to replicate the setup with the old versions e used in March:
 
 Single stream from another machine to chromebook with 1Gbps USB3
 Ethernet adapter.
 iperf -c address -P 1 -i 1 -p 5001 -f k -t 10
 to HOST: 858316 Kbits/sec
 to GUEST: 761563 Kbits/sec
 
 10 parallel streams
 iperf -c address -P 10 -i 1 -p 5001 -f k -t 10
 to HOST: 842420 Kbits/sec
 to GUEST: 625144 Kbits/sec
 

Appreciate your work. Is it convenient for you to test the same cases
without vhost=on? Then the results will show the improvement of performance
clearly only with ioeventfd.

I will try to test it with a Hisilicon board which is ongoing.

Best regards

Li







 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


 regards,
 Nikolay Nikolaev
 Virtual Open Systems
 
 .
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Test case for VFIO_PLATFORM returning device tree info

2014-08-13 Thread Antonios Motakis

This is a test case complementing the patch series:
[RFC 0/4] VFIO: PLATFORM: Return device tree info for a platform device node

This test case is based on the ARM PL330 DMA controller and shows how device
node properties can be accessed from userspace.

It doesn't apply on anything in particular, sent in a patch format on the ML 
merely for reading convenience. It can be pulled from:
g...@github.com:virtualopensystems/vfio-devtree-test.git

---
 vfio-dt.c | 172 ++
 1 file changed, 172 insertions(+)
 create mode 100644 vfio-dt.c

diff --git a/vfio-dt.c b/vfio-dt.c
new file mode 100644
index 000..6e671dd
--- /dev/null
+++ b/vfio-dt.c
@@ -0,0 +1,172 @@
+#include stdio.h
+#include sys/fcntl.h
+#include sys/mman.h
+#include linux/vfio.h
+#include sys/eventfd.h
+#include stdlib.h
+#include unistd.h
+#include errno.h
+#include string.h
+
+#define VFIO_DEVICE_FLAGS_DEVTREE  (1  3) /* device tree metadata */
+
+struct vfio_devtree_info {
+   __u32   argsz;
+   __u32   type;
+#define VFIO_DEVTREE_PROP_NAMES0
+#define VFIO_DEVTREE_ARR_TYPE_STRING   1
+#define VFIO_DEVTREE_ARR_TYPE_U8   2
+#define VFIO_DEVTREE_ARR_TYPE_U16  3
+#define VFIO_DEVTREE_ARR_TYPE_U32  4
+   __u32   length;
+   __u8data[];
+};
+#define VFIO_DEVICE_GET_DEVTREE_INFO   _IO(VFIO_TYPE, VFIO_BASE + 17)
+
+static void vfio_pr_devtree_prop(int device, char *name, unsigned int type)
+{
+   static unsigned int length = 0;
+   static struct vfio_devtree_info *devtree = NULL;
+   int ret;
+
+   if (length  strlen(name) + 1)
+   length = strlen(name) + 1;
+
+   while (1) {
+   unsigned int argsz = sizeof(struct vfio_devtree_info) + length;
+   devtree = realloc(devtree, argsz);
+   devtree-argsz = argsz;
+   devtree-type = type;
+   strcpy(devtree-data, name);
+
+   ret = ioctl(device, VFIO_DEVICE_GET_DEVTREE_INFO, devtree);
+
+   if (length  devtree-length)
+   length = devtree-length;
+   else
+   break;
+   }
+
+   if (ret) {
+   printf(%s = error %d\n, name, ret);
+   } else if (type == VFIO_DEVTREE_ARR_TYPE_STRING ||
+   type == VFIO_DEVTREE_PROP_NAMES) {
+   int i;
+   printf(%s =, name);
+   for (i=0; i  devtree-length; i += strlen(devtree-data + i) + 
1)
+   printf( \%s\, devtree-data + i);
+   printf(\n);
+
+   } else if (type == VFIO_DEVTREE_ARR_TYPE_U32 ||
+  type == VFIO_DEVTREE_ARR_TYPE_U16 ||
+  type == VFIO_DEVTREE_ARR_TYPE_U8) {
+   long unsigned int *uarr = (long unsigned int *) devtree-data;
+
+   printf(%s =, name);
+   while (uarr  devtree-data + devtree-length) {
+   printf( 0x%lx, *uarr);
+   uarr++;
+   }
+   printf(\n);
+   }
+}
+
+int main (int argc, char **argv) {
+
+   int container, group, device;
+   unsigned int i;
+
+   struct vfio_group_status group_status = { .argsz = sizeof(group_status) 
};
+   struct vfio_iommu_type1_info iommu_info = { .argsz = sizeof(iommu_info) 
};
+   struct vfio_iommu_type1_dma_map dma_map = { .argsz = sizeof(dma_map) };
+   struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
+
+   int ret;
+
+   if (argc != 3) {
+   printf(Usage: ./vfio-dt /dev/vfio/${group_id} device_id\n);
+   return 2;
+   }
+
+   /* Create a new container */
+   container = open(/dev/vfio/vfio, O_RDWR);
+
+   if (ioctl(container, VFIO_GET_API_VERSION) != VFIO_API_VERSION) {
+   printf(Unknown API version\n);
+   return 1;
+   }
+
+   if (!ioctl(container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) {
+   printf(Doesn't support the IOMMU driver we want\n);
+   return 1;
+   }
+
+   /* Open the group */
+   group = open(argv[1], O_RDWR);
+
+   /* Test the group is viable and available */
+   ioctl(group, VFIO_GROUP_GET_STATUS, group_status);
+
+   if (!(group_status.flags  VFIO_GROUP_FLAGS_VIABLE)) {
+   printf(Group is not viable (not all devices bound for 
vfio)\n);
+   return 1;
+   }
+
+   /* Add the group to the container */
+   ioctl(group, VFIO_GROUP_SET_CONTAINER, container);
+
+   /* Enable the IOMMU model we want */
+   ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
+
+   /* Get addition IOMMU info */
+   ioctl(container, VFIO_IOMMU_GET_INFO, iommu_info);
+
+   /* Get a file descriptor for the device */
+   device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, argv[2]);
+   printf(=== VFIO device file descriptor %d ===\n, device);
+
+   /* Test and setup the

Re: The status about vhost-net on kvm-arm?

2014-08-13 Thread Nikolay Nikolaev

On Wed, Aug 13, 2014 at 12:10 PM, Nikolay Nikolaev
n.nikol...@virtualopensystems.com wrote:
 On Tue, Aug 12, 2014 at 6:47 PM, Nikolay Nikolaev
 n.nikol...@virtualopensystems.com wrote:

 Hello,


 On Tue, Aug 12, 2014 at 5:41 AM, Li Liu john.li...@huawei.com wrote:
 
  Hi all,
 
  Is anyone there can tell the current status of vhost-net on kvm-arm?
 
  Half a year has passed from Isa Ansharullah asked this question:
  http://www.spinics.net/lists/kvm-arm/msg08152.html
 
  I have found two patches which have provided the kvm-arm support of
  eventfd and irqfd:
 
  1) [RFC PATCH 0/4] ARM: KVM: Enable the ioeventfd capability of KVM on ARM
  http://lists.gnu.org/archive/html/qemu-devel/2014-01/msg01770.html
 
  2) [RFC,v3] ARM: KVM: add irqfd and irq routing support
  https://patches.linaro.org/32261/
 
  And there's a rough patch for qemu to support eventfd from Ying-Shiuan Pan:
 
  [Qemu-devel] [PATCH 0/4] ioeventfd support for virtio-mmio
  https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html
 
  But there no any comments of this patch. And I can found nothing about qemu
  to support irqfd. Do I lost the track?
 
  If nobody try to fix it. We have a plan to complete it about virtio-mmio
  supporing irqfd and multiqueue.
 
 

 we at Virtual Open Systems did some work and tested vhost-net on ARM
 back in March.
 The setup was based on:
  - host kernel with our ioeventfd patches:
 http://www.spinics.net/lists/kvm-arm/msg08413.html

 - qemu with the aforementioned patches from Ying-Shiuan Pan
 https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html

 The testbed was ARM Chromebook with Exynos 5250, using a 1Gbps USB3
 Ethernet adapter connected to a 1Gbps switch. I can't find the actual
 numbers but I remember that with multiple streams the gain was clearly
 seen. Note that it used the minimum required ioventfd implementation
 and not irqfd.

 I guess it is feasible to think that it all can be put together and
 rebased + the recent irqfd work. One can achiev even better
 performance (because of the irqfd).


 Managed to replicate the setup with the old versions e used in March:

 Single stream from another machine to chromebook with 1Gbps USB3
 Ethernet adapter.
 iperf -c address -P 1 -i 1 -p 5001 -f k -t 10
 to HOST: 858316 Kbits/sec
 to GUEST: 761563 Kbits/sec
to GUEST vhost=off: 508150 Kbits/sec

 10 parallel streams
 iperf -c address -P 10 -i 1 -p 5001 -f k -t 10
 to HOST: 842420 Kbits/sec
 to GUEST: 625144 Kbits/sec
to GUEST vhost=off: 425276 Kbits/sec


 
 
 
 
 
  ___
  kvmarm mailing list
  kvm...@lists.cs.columbia.edu
  https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


 regards,
 Nikolay Nikolaev
 Virtual Open Systems
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] arm64: fix VTTBR_BADDR_MASK

2014-08-13 Thread Christoffer Dall

On Tue, Aug 12, 2014 at 06:05:21PM +0200, Christoffer Dall wrote:
 On Mon, Aug 11, 2014 at 03:38:23PM -0500, Joel Schopp wrote:
  The current VTTBR_BADDR_MASK only masks 39 bits, which is broken on current
  systems.  Rather than just add a bit it seems like a good time to also set
  things at run-time instead of compile time to accomodate more hardware.
  
  This patch sets TCR_EL2.PS, VTCR_EL2.T0SZ and vttbr_baddr_mask in runtime,
  not compile time.
  
  In ARMv8, EL2 physical address size (TCR_EL2.PS) and stage2 input address
  size (VTCR_EL2.T0SZE) cannot be determined in compile time since they
  depend on hardware capability.
  
  According to Table D4-23 and Table D4-25 in ARM DDI 0487A.b document,
  vttbr_x is calculated using different fixed values with consideration
  of T0SZ, granule size and the level of translation tables. Therefore,
  vttbr_baddr_mask should be determined dynamically.
  
  Changes since v3:
  Another rebase
  Addressed minor comments from v2
  
  Changes since v2:
  Rebased on 
  https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next 
  branch
  
  Changes since v1:
  Rebased fix on Jungseok Lee's patch https://lkml.org/lkml/2014/5/12/189 to
  provide better long term fix.  Updated that patch to log error instead of
  silently fail on unaligned vttbr.
  
  Cc: Christoffer Dall christoffer.d...@linaro.org
  Cc: Sungjinn Chung sungjinn.ch...@samsung.com
  Signed-off-by: Jungseok Lee jays@samsung.com
  Signed-off-by: Joel Schopp joel.sch...@amd.com
  ---
   arch/arm/kvm/arm.c   |  116 
  +-
   arch/arm64/include/asm/kvm_arm.h |   17 +-
   arch/arm64/kvm/hyp-init.S|   20 +--
   3 files changed, 131 insertions(+), 22 deletions(-)
  
  diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
  index 3c82b37..b4859fa 100644
  --- a/arch/arm/kvm/arm.c
  +++ b/arch/arm/kvm/arm.c
  @@ -37,6 +37,7 @@
   #include asm/mman.h
   #include asm/tlbflush.h
   #include asm/cacheflush.h
  +#include asm/cputype.h
   #include asm/virt.h
   #include asm/kvm_arm.h
   #include asm/kvm_asm.h
  @@ -61,6 +62,8 @@ static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
   static u8 kvm_next_vmid;
   static DEFINE_SPINLOCK(kvm_vmid_lock);
   
  +static u64 vttbr_baddr_mask;
  +
   static bool vgic_present;
   
   static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
  @@ -412,6 +415,103 @@ static bool need_new_vmid_gen(struct kvm *kvm)
  return unlikely(kvm-arch.vmid_gen != atomic64_read(kvm_vmid_gen));
   }
   
  +
  +
  +   /*
  +* ARMv8 64K architecture limitations:
  +* 16 = T0SZ = 21 is valid under 3 level of translation tables
  +* 18 = T0SZ = 34 is valid under 2 level of translation tables
  +* 31 = T0SZ = 39 is valid under 1 level of transltaion tables
  +*
  +* ARMv8 4K architecture limitations:
  +* 16 = T0SZ = 24 is valid under 4 level of translation tables
  +* 21 = T0SZ = 30 is valid under 3 level of translation tables
 
 this is still wrong, as I pointed out, it should be 21 = T0SZ = 30
 
typo: I meant: 21 = T0SZ = 33

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] PC, KVM, CMA: Fix regression caused by wrong get_order() use

2014-08-13 Thread Aneesh Kumar K.V

Alexey Kardashevskiy a...@ozlabs.ru writes:

 fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no
 functional change but this is not true as it calls get_order() (which
 takes bytes) where it should have called ilog2() and the kernel stops
 on VM_BUG_ON().

 This replaces get_order() with ilog2().

Should we round it up ? ie, ilog2(kvm_rma_pages - 1) + 1 ?


 Suggested-by: Paul Mackerras pau...@samba.org
 Cc: Alexander Graf ag...@suse.de
 Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 Cc: Joonsoo Kim iamjoonsoo@lge.com
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: sta...@vger.kernel.org

Why stable ? We merged it this merge window.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
 b/arch/powerpc/kvm/book3s_hv_builtin.c
 index 329d7fd..bfe9f01 100644
 --- a/arch/powerpc/kvm/book3s_hv_builtin.c
 +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
 @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma()
   ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL);
   if (!ri)
   return NULL;
 - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages));
 + page = cma_alloc(kvm_cma, kvm_rma_pages, ilog2(kvm_rma_pages));
   if (!page)
   goto err_out;
   atomic_set(ri-use_count, 1);
 @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
  {
   unsigned long align_pages = HPT_ALIGN_PAGES;

 - VM_BUG_ON(get_order(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
 + VM_BUG_ON(ilog2(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);

   /* Old CPUs require HPT aligned on a multiple of its size */
   if (!cpu_has_feature(CPU_FTR_ARCH_206))
   align_pages = nr_pages;
 - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
 + return cma_alloc(kvm_cma, nr_pages, ilog2(align_pages));
  }
  EXPORT_SYMBOL_GPL(kvm_alloc_hpt);

 -- 
 2.0.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost: Add polling mode

2014-08-13 Thread Michael S. Tsirkin

On Tue, Aug 12, 2014 at 01:57:05PM +0300, Razya Ladelsky wrote:
 Michael S. Tsirkin m...@redhat.com wrote on 12/08/2014 12:18:50 PM:

  From: Michael S. Tsirkin m...@redhat.com
  To: David Miller da...@davemloft.net
  Cc: Razya Ladelsky/Haifa/IBM@IBMIL, kvm@vger.kernel.org, Alex 
  Glikson/Haifa/IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, Yossi 
  Kuperman1/Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL, 
  abel.gor...@gmail.com, linux-ker...@vger.kernel.org, 
  net...@vger.kernel.org, virtualizat...@lists.linux-foundation.org
  Date: 12/08/2014 12:18 PM
  Subject: Re: [PATCH] vhost: Add polling mode

  On Mon, Aug 11, 2014 at 12:46:21PM -0700, David Miller wrote:
   From: Michael S. Tsirkin m...@redhat.com
   Date: Sun, 10 Aug 2014 21:45:59 +0200

On Sun, Aug 10, 2014 at 11:30:35AM +0300, Razya Ladelsky wrote:
...
And, did your tests actually produce 100% load on both host CPUs?
...

   Michael, please do not quote an entire patch just to ask a one line
   question.

   I truly, truly, wish it was simpler in modern email clients to delete
   the unrelated quoted material because I bet when people do this they
   are simply being lazy.

   Thank you.

  Lazy - mea culpa, though I'm using mutt so it isn't even hard.

  The question still stands: the test results are only valid
  if CPU was at 100% in all configurations.
  This is the reason I generally prefer it when people report
  throughput divided by CPU (power would be good too but it still
  isn't easy for people to get that number).

 Hi Michael,

 Sorry for the delay, had some problems with my mailbox, and I realized 
 just now that 
 my reply wasn't sent.
 The vm indeed ALWAYS utilized 100% cpu, whether polling was enabled or 
 not.
 The vhost thread utilized less than 100% (of the other cpu) when polling 
 was disabled.
 Enabling polling increased its utilization to 100% (in which case both 
 cpus were 100% utilized). 

Hmm this means the testing wasn't successful then, as you said:

The idea was to get it 100% loaded, so we can see that the polling is
getting it to produce higher throughput.

in fact here you are producing more throughput but spending more power
to produce it, which can have any number of explanations besides polling
improving the efficiency. For example, increasing system load might
disable host power management.

  -- 
  MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Qemu: Fix eax for cpuid leaf 0x40000000

2014-08-13 Thread Paolo Bonzini

Il 12/08/2014 21:29, Eduardo Habkost ha scritto:
 On Tue, Aug 12, 2014 at 09:12:00PM +0200, Paolo Bonzini wrote:
 Il 12/08/2014 20:55, Eduardo Habkost ha scritto:
 This makes the CPUID data change under the guest's feet during
 live-migration.

 Adding compat code to ensure older machine-types keep the old behavior
 is necessary, but in this specific case it is mostly harmless because
 0x0 is documented as being equivalent to 0x4001.

 (But I don't know how guests are supposed to behave when they see
 CPUID[KVM_CPUID_SIGNATURE_NEXT].EAX==0.)

 The only obvious thing to do would be to treat it as 0x4101.
 
 I just want to be sure the guests really do that. If we know guests
 won't do anything different with the CPUID change, I won't mind having
 no compat code for this.
 

Considering that only two leaves are defined for KVM, and both are
mandatory I don't think current guests have any reason to look at
CPUID[KVM_CPUID_SIGNATURE | kvm_base].EAX at all.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: x86: Avoid emulating instructions on #UD mistakenly

2014-08-13 Thread Nadav Amit

Commit d40a6898e5 mistakenly caused instructions which are not marked as
EmulateOnUD to be emulated upon #UD exception. The commit caused the check of
whether the instruction flags include EmulateOnUD to never be evaluated. As a
result instructions whose emulation is broken may be emulated.  This fix moves
the evaluation of EmulateOnUD so it would be evaluated.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 56657b0..37a83b2 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4394,6 +4394,9 @@ done_prefixes:
 
ctxt-execute = opcode.u.execute;
 
+   if (!(ctxt-d  EmulateOnUD)  ctxt-ud)
+   return EMULATION_FAILED;
+
if (unlikely(ctxt-d 
 
(NotImpl|EmulateOnUD|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm))) {
/*
@@ -4406,9 +4409,6 @@ done_prefixes:
if (ctxt-d  NotImpl)
return EMULATION_FAILED;
 
-   if (!(ctxt-d  EmulateOnUD)  ctxt-ud)
-   return EMULATION_FAILED;
-
if (mode == X86EMUL_MODE_PROT64  (ctxt-d  Stack))
ctxt-op_bytes = 8;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] arm64: fix VTTBR_BADDR_MASK

2014-08-13 Thread Jungseok Lee

On Aug 13, 2014, at 8:33 PM, Christoffer Dall wrote:
 On Tue, Aug 12, 2014 at 06:05:21PM +0200, Christoffer Dall wrote:
 On Mon, Aug 11, 2014 at 03:38:23PM -0500, Joel Schopp wrote:
 The current VTTBR_BADDR_MASK only masks 39 bits, which is broken on current
 systems.  Rather than just add a bit it seems like a good time to also set
 things at run-time instead of compile time to accomodate more hardware.
 
 This patch sets TCR_EL2.PS, VTCR_EL2.T0SZ and vttbr_baddr_mask in runtime,
 not compile time.
 
 In ARMv8, EL2 physical address size (TCR_EL2.PS) and stage2 input address
 size (VTCR_EL2.T0SZE) cannot be determined in compile time since they
 depend on hardware capability.
 
 According to Table D4-23 and Table D4-25 in ARM DDI 0487A.b document,
 vttbr_x is calculated using different fixed values with consideration
 of T0SZ, granule size and the level of translation tables. Therefore,
 vttbr_baddr_mask should be determined dynamically.
 
 Changes since v3:
 Another rebase
 Addressed minor comments from v2
 
 Changes since v2:
 Rebased on 
 https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next 
 branch
 
 Changes since v1:
 Rebased fix on Jungseok Lee's patch https://lkml.org/lkml/2014/5/12/189 to
 provide better long term fix.  Updated that patch to log error instead of
 silently fail on unaligned vttbr.
 
 Cc: Christoffer Dall christoffer.d...@linaro.org
 Cc: Sungjinn Chung sungjinn.ch...@samsung.com
 Signed-off-by: Jungseok Lee jays@samsung.com
 Signed-off-by: Joel Schopp joel.sch...@amd.com
 ---
 arch/arm/kvm/arm.c   |  116 
 +-
 arch/arm64/include/asm/kvm_arm.h |   17 +-
 arch/arm64/kvm/hyp-init.S|   20 +--
 3 files changed, 131 insertions(+), 22 deletions(-)
 
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 3c82b37..b4859fa 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -37,6 +37,7 @@
 #include asm/mman.h
 #include asm/tlbflush.h
 #include asm/cacheflush.h
 +#include asm/cputype.h
 #include asm/virt.h
 #include asm/kvm_arm.h
 #include asm/kvm_asm.h
 @@ -61,6 +62,8 @@ static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
 static u8 kvm_next_vmid;
 static DEFINE_SPINLOCK(kvm_vmid_lock);
 
 +static u64 vttbr_baddr_mask;
 +
 static bool vgic_present;
 
 static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
 @@ -412,6 +415,103 @@ static bool need_new_vmid_gen(struct kvm *kvm)
 return unlikely(kvm-arch.vmid_gen != atomic64_read(kvm_vmid_gen));
 }
 
 +
 +
 +   /*
 +* ARMv8 64K architecture limitations:
 +* 16 = T0SZ = 21 is valid under 3 level of translation tables
 +* 18 = T0SZ = 34 is valid under 2 level of translation tables
 +* 31 = T0SZ = 39 is valid under 1 level of transltaion tables
 +*
 +* ARMv8 4K architecture limitations:
 +* 16 = T0SZ = 24 is valid under 4 level of translation tables
 +* 21 = T0SZ = 30 is valid under 3 level of translation tables
 
 this is still wrong, as I pointed out, it should be 21 = T0SZ = 30
 
 typo: I meant: 21 = T0SZ = 33

Christoffer is right. The original patch, [1], described the conditions 
incorrectly.

[1]: https://lkml.org/lkml/2014/5/12/189

- Jungseok Lee--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: Avoid emulating instructions on #UD mistakenly

2014-08-13 Thread Nadav Amit

Correction: the word “never” in the message is too harsh.
Nonetheless, there is a regression bug. I encountered it with “wrfsbase” 
instruction.

Nadav

On Aug 13, 2014, at 4:50 PM, Nadav Amit na...@cs.technion.ac.il wrote:

 Commit d40a6898e5 mistakenly caused instructions which are not marked as
 EmulateOnUD to be emulated upon #UD exception. The commit caused the check of
 whether the instruction flags include EmulateOnUD to never be evaluated. As a
 result instructions whose emulation is broken may be emulated.  This fix moves
 the evaluation of EmulateOnUD so it would be evaluated.
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
 arch/x86/kvm/emulate.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 56657b0..37a83b2 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -4394,6 +4394,9 @@ done_prefixes:
 
   ctxt-execute = opcode.u.execute;
 
 + if (!(ctxt-d  EmulateOnUD)  ctxt-ud)
 + return EMULATION_FAILED;
 +
   if (unlikely(ctxt-d 

 (NotImpl|EmulateOnUD|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm))) {
   /*
 @@ -4406,9 +4409,6 @@ done_prefixes:
   if (ctxt-d  NotImpl)
   return EMULATION_FAILED;
 
 - if (!(ctxt-d  EmulateOnUD)  ctxt-ud)
 - return EMULATION_FAILED;
 -
   if (mode == X86EMUL_MODE_PROT64  (ctxt-d  Stack))
   ctxt-op_bytes = 8;
 
 -- 
 1.9.1
 



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread Theodore Ts'o

On Wed, Aug 13, 2014 at 12:48:41AM -0700, H. Peter Anvin wrote:
 The proposed arch_get_rng_seed() is not really what it claims to be; it
 most definitely does not produce seed-grade randomness, instead it seems
 to be an arch function for best-effort initialization of the entropy
 pools -- which is fine, it is just something quite different.

Without getting into an argument about which definition of seed is
correct --- it's certainly confusing and different form the RDSEED
usage of the word seed.

Do we expect that anyone else besides arch_get_rnd_seed() would
actually want to use it?  I'd argue no; we want the rest of the kernel
to either use get_random_bytes() or prandom_u32().  Given that, maybe
we should just call it arch_random_init(), and expect that the only
user of this interface would be drivers/char/random.c?

- Ted
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[no subject]

2014-08-13 Thread Umesh Deshpande

unsubscribe kvm udesh...@binghamton.edu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] ARM: KVM: add irqfd support

2014-08-13 Thread Christoffer Dall

On Mon, Aug 04, 2014 at 02:08:22PM +0200, Eric Auger wrote:
 This patch enables irqfd on ARM.
 
 irqfd framework enables to inject a virtual IRQ into a guest upon an
 eventfd trigger. User-side uses KVM_IRQFD VM ioctl to provide KVM with
 a kvm_irqfd struct that associates a VM, an eventfd, an IRQ number
 (aka. the gsi). When an actor signals the eventfd (typically a VFIO
 platform driver), the kvm irqfd subsystem injects the provided virtual
 IRQ into the guest.
 
 The gsi must correspond to a shared peripheral interrupt (SPI), ie the
 GIC interrupt ID is gsi+32.

Why can't we support PPIs?

 
 CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD are turned on.

This patch enables CONFIG_

 
 No IRQ routing table is used thanks to Paul Mackerras' patch serie:
 IRQFD without IRQ routing, enabled for XICS
 (https://www.mail-archive.com/kvm@vger.kernel.org/msg104478.html)
 
 Signed-off-by: Eric Auger eric.au...@linaro.org
 
 ---
 
 This patch would deprecate the previous patch featuring GSI routing
 (https://patches.linaro.org/32261/)
 
 irqchip.c and irq_comm.c are not used at all.
 
 This RFC applies on top of Christoffer Dall's serie
 arm/arm64: KVM: Various VGIC cleanups and improvements
 https://lists.cs.columbia.edu/pipermail/kvmarm/2014-June/009979.html
 
 All pieces can be found on git://git.linaro.org/people/eric.auger/linux.git
 branch irqfd_integ_v4
 
 This work was tested with Calxeda Midway xgmac main interrupt with
 qemu-system-arm and QEMU VFIO platform device.
 ---
  Documentation/virtual/kvm/api.txt |  5 +++-
  arch/arm/include/uapi/asm/kvm.h   |  3 +++
  arch/arm/kvm/Kconfig  |  3 ++-
  arch/arm/kvm/Makefile |  2 +-
  arch/arm/kvm/irq.h| 25 ++
  virt/kvm/arm/vgic.c   | 54 
 ---
  6 files changed, 85 insertions(+), 7 deletions(-)
  create mode 100644 arch/arm/kvm/irq.h
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 0fe3649..04310d9 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2132,7 +2132,7 @@ into the hash PTE second double word).
  4.75 KVM_IRQFD
  
  Capability: KVM_CAP_IRQFD
 -Architectures: x86 s390
 +Architectures: x86 s390 arm
  Type: vm ioctl
  Parameters: struct kvm_irqfd (in)
  Returns: 0 on success, -1 on error
 @@ -2158,6 +2158,9 @@ Note that closing the resamplefd is not sufficient to 
 disable the
  irqfd.  The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment
  and need not be specified with KVM_IRQFD_FLAG_DEASSIGN.
  
 +On ARM/arm64 the injected must be a shared peripheral interrupt (SPI).
 +This means the programmed GIC interrupt ID is gsi+32.
 +
  4.76 KVM_PPC_ALLOCATE_HTAB
  
  Capability: KVM_CAP_PPC_ALLOC_HTAB
 diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
 index e6ebdd3..3034c66 100644
 --- a/arch/arm/include/uapi/asm/kvm.h
 +++ b/arch/arm/include/uapi/asm/kvm.h
 @@ -194,6 +194,9 @@ struct kvm_arch_memory_slot {
  /* Highest supported SPI, from VGIC_NR_IRQS */
  #define KVM_ARM_IRQ_GIC_MAX  127
  
 +/* One single KVM irqchip, ie. the VGIC */
 +#define KVM_NR_IRQCHIPS  1
 +
  /* PSCI interface */
  #define KVM_PSCI_FN_BASE 0x95c1ba5e
  #define KVM_PSCI_FN(n)   (KVM_PSCI_FN_BASE + (n))
 diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
 index 4be5bb1..7800261 100644
 --- a/arch/arm/kvm/Kconfig
 +++ b/arch/arm/kvm/Kconfig
 @@ -24,6 +24,7 @@ config KVM
   select KVM_MMIO
   select KVM_ARM_HOST
   depends on ARM_VIRT_EXT  ARM_LPAE  !CPU_BIG_ENDIAN
 + select HAVE_KVM_EVENTFD
   ---help---
 Support hosting virtualized guest machines. You will also
 need to select one or more of the processor modules below.
 @@ -55,7 +56,7 @@ config KVM_ARM_MAX_VCPUS
  config KVM_ARM_VGIC
   bool KVM support for Virtual GIC
   depends on KVM_ARM_HOST  OF
 - select HAVE_KVM_IRQCHIP
 + select HAVE_KVM_IRQFD
   default y
   ---help---
 Adds support for a hardware assisted, in-kernel GIC emulation.
 diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
 index 789bca9..2fa2f82 100644
 --- a/arch/arm/kvm/Makefile
 +++ b/arch/arm/kvm/Makefile
 @@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
  AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)
  
  KVM := ../../../virt/kvm
 -kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o
 +kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o
  
  obj-y += kvm-arm.o init.o interrupts.o
  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
 diff --git a/arch/arm/kvm/irq.h b/arch/arm/kvm/irq.h
 new file mode 100644
 index 000..1275d91
 --- /dev/null
 +++ b/arch/arm/kvm/irq.h
 @@ -0,0 +1,25 @@
 +/*
 + * Copyright (C) 2014 Linaro Ltd.
 + * Authors: Eric Auger eric.au...@linaro.org
 + *
 + * This program is free software; you can redistribute it and/or

[GIT PULL] VFIO updates for 3.17-rc1

2014-08-13 Thread Alex Williamson

Hi Linus,

The following changes since commit 7f0d32e0c1a7a23216a0f2694ec841f60e9dddfd:

  Merge tag 'microblaze-3.17-rc1' of git://git.monstr.eu/linux-2.6-microblaze 
(2014-08-07 09:02:26 -0700)

are available in the git repository at:


  git://github.com/awilliam/linux-vfio.git tags/vfio-v3.17-rc1

for you to fetch changes up to 9b936c960f22954bfb89f2fefd8f96916bb42908:

  drivers/vfio: Enable VFIO if EEH is not supported (2014-08-08 10:39:16 -0600)


VFIO updates for v3.17-rc1
 - Enable support for bus reset on device release
 - Fixes for EEH support


Alex Williamson (3):
  vfio-pci: Release devices with BusMaster disabled
  vfio-pci: Use mutex around open, release, and remove
  vfio-pci: Attempt bus/slot reset on release

Alexey Kardashevskiy (2):
  drivers/vfio: Allow EEH to be built as module
  drivers/vfio: Enable VFIO if EEH is not supported

Gavin Shan (1):
  drivers/vfio: Fix EEH build error

 drivers/vfio/Kconfig|   6 ++
 drivers/vfio/Makefile   |   2 +-
 drivers/vfio/pci/vfio_pci.c | 161 
 drivers/vfio/pci/vfio_pci_private.h |   3 +-
 drivers/vfio/vfio_spapr_eeh.c   |  17 +++-
 include/linux/vfio.h|   6 +-
 6 files changed, 170 insertions(+), 25 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] KVM: SVM: add rdmsr support for AMD event registers

2014-08-13 Thread Wei Huang

Current KVM only supports RDMSR for K7_EVNTSEL0 and K7_EVNTSEL0
MSRs. Reading the rest MSRs will trigger KVM to inject #GP into
guest VM. This causes a warning message Failed to access perfctr
msr (MSR c0010001 is ) on AMD host. This patch
adds RDMSR support for all K7_EVNTSELn and K7_EVNTSELn registers
and thus supresses the warning message.

Signed-off-by: Wei Huang wehu...@redhat.com
---
 arch/x86/kvm/x86.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ef432f8..3f10ca2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2399,7 +2399,13 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
case MSR_K7_HWCR:
case MSR_VM_HSAVE_PA:
case MSR_K7_EVNTSEL0:
+   case MSR_K7_EVNTSEL1:
+   case MSR_K7_EVNTSEL2:
+   case MSR_K7_EVNTSEL3:
case MSR_K7_PERFCTR0:
+   case MSR_K7_PERFCTR1:
+   case MSR_K7_PERFCTR2:
+   case MSR_K7_PERFCTR3:
case MSR_K8_INT_PENDING_MSG:
case MSR_AMD64_NB_CFG:
case MSR_FAM10H_MMIO_CONF_BASE:
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] KVM: SVM: add rdmsr support for AMD event registers

2014-08-13 Thread Wei Huang


Wrong one, sorry. Please discard this one. Updated one will follow.

-Wei

On 08/13/2014 10:58 AM, Wei Huang wrote:

Current KVM only supports RDMSR for K7_EVNTSEL0 and K7_EVNTSEL0
MSRs. Reading the rest MSRs will trigger KVM to inject #GP into
guest VM. This causes a warning message Failed to access perfctr
msr (MSR c0010001 is ) on AMD host. This patch
adds RDMSR support for all K7_EVNTSELn and K7_EVNTSELn registers
and thus supresses the warning message.

Signed-off-by: Wei Huang wehu...@redhat.com
---
  arch/x86/kvm/x86.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ef432f8..3f10ca2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2399,7 +2399,13 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
case MSR_K7_HWCR:
case MSR_VM_HSAVE_PA:
case MSR_K7_EVNTSEL0:
+   case MSR_K7_EVNTSEL1:
+   case MSR_K7_EVNTSEL2:
+   case MSR_K7_EVNTSEL3:
case MSR_K7_PERFCTR0:
+   case MSR_K7_PERFCTR1:
+   case MSR_K7_PERFCTR2:
+   case MSR_K7_PERFCTR3:
case MSR_K8_INT_PENDING_MSG:
case MSR_AMD64_NB_CFG:
case MSR_FAM10H_MMIO_CONF_BASE:


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] KVM: SVM: add rdmsr support for AMD event registers

2014-08-13 Thread Wei Huang

Current KVM only supports RDMSR for K7_EVNTSEL0 and K7_PERFCTR0
MSRs. Reading the rest MSRs will trigger KVM to inject #GP into
guest VM. This causes a warning message Failed to access perfctr
msr (MSR c0010001 is ) on AMD host. This patch
adds RDMSR support for all K7_EVNTSELn and K7_PERFCTRn registers
and thus supresses the warning message.

Signed-off-by: Wei Huang wehu...@redhat.com
---
 arch/x86/kvm/x86.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ef432f8..3f10ca2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2399,7 +2399,13 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
case MSR_K7_HWCR:
case MSR_VM_HSAVE_PA:
case MSR_K7_EVNTSEL0:
+   case MSR_K7_EVNTSEL1:
+   case MSR_K7_EVNTSEL2:
+   case MSR_K7_EVNTSEL3:
case MSR_K7_PERFCTR0:
+   case MSR_K7_PERFCTR1:
+   case MSR_K7_PERFCTR2:
+   case MSR_K7_PERFCTR3:
case MSR_K8_INT_PENDING_MSG:
case MSR_AMD64_NB_CFG:
case MSR_FAM10H_MMIO_CONF_BASE:
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread Andy Lutomirski

On Wed, Aug 13, 2014 at 7:32 AM, Theodore Ts'o ty...@mit.edu wrote:
 On Wed, Aug 13, 2014 at 12:48:41AM -0700, H. Peter Anvin wrote:
 The proposed arch_get_rng_seed() is not really what it claims to be; it
 most definitely does not produce seed-grade randomness, instead it seems
 to be an arch function for best-effort initialization of the entropy
 pools -- which is fine, it is just something quite different.

 Without getting into an argument about which definition of seed is
 correct --- it's certainly confusing and different form the RDSEED
 usage of the word seed.

 Do we expect that anyone else besides arch_get_rnd_seed() would
 actually want to use it?

If you mean random.c instead of arch_get_rnd_seed, then I don't expect
there to be other users.  Aside from the best-effort bit causing
this to be basically useless on old bare metal, the interface is
really awkward for anything other than the use in random.c.

 I'd argue no; we want the rest of the kernel
 to either use get_random_bytes() or prandom_u32().  Given that, maybe
 we should just call it arch_random_init(), and expect that the only
 user of this interface would be drivers/char/random.c?

Sounds good to me.

FWIW, I'd like to see a second use added in random.c: I think that we
should do this, or even all of init_std_data, on resume from suspend
and especially on resume from hibernate / kexec.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Regression problem with commit 5045b46803

2014-08-13 Thread Nadav Amit

Commit 5045b46803 added a check that cs.dpl equals cs.rpl during
task-switch. Unfortunately, is causes some of my tests that run well on
bare-metal to fail.

Although this check is mentioned in table 7-1 of the SDM as causing a
#TSS exception, it is not mentioned in table 6-6 that lists invalid TSS
conditions which cause #TSS exceptions.

Thus, I recommend on reverting commit 5045b46803, or alternatively
rechecking task-switch behavior on bare-metal.

Nadav
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread H. Peter Anvin

On 08/13/2014 09:13 AM, Andy Lutomirski wrote:
 
 Sounds good to me.
 
 FWIW, I'd like to see a second use added in random.c: I think that we
 should do this, or even all of init_std_data, on resume from suspend
 and especially on resume from hibernate / kexec.
 

Yes, we should.  We also need to make it possible to do this after
cloning a VM.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread Theodore Ts'o

On Wed, Aug 13, 2014 at 10:45:25AM -0700, H. Peter Anvin wrote:
 On 08/13/2014 09:13 AM, Andy Lutomirski wrote:
  
  Sounds good to me.
  
  FWIW, I'd like to see a second use added in random.c: I think that we
  should do this, or even all of init_std_data, on resume from suspend
  and especially on resume from hibernate / kexec.
  
 
 Yes, we should.  We also need to make it possible to do this after
 cloning a VM.

Agreed.  Can you send a patch?

I can carry the commits to add arch_random_init() the generic version,
and the patch to call it after suspend/resume.  I'll do this at the
very head of the random tree, and make sure it gets pushed to Linus
early during the next merge window.

Does that sound like a plan?  Or does someone want to suggest
something different?  I'm flexible...

- Ted
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread Andy Lutomirski

On Wed, Aug 13, 2014 at 11:22 AM, Theodore Ts'o ty...@mit.edu wrote:
 On Wed, Aug 13, 2014 at 10:45:25AM -0700, H. Peter Anvin wrote:
 On 08/13/2014 09:13 AM, Andy Lutomirski wrote:
 
  Sounds good to me.
 
  FWIW, I'd like to see a second use added in random.c: I think that we
  should do this, or even all of init_std_data, on resume from suspend
  and especially on resume from hibernate / kexec.
 

 Yes, we should.  We also need to make it possible to do this after
 cloning a VM.

 Agreed.  Can you send a patch?

 I can carry the commits to add arch_random_init() the generic version,
 and the patch to call it after suspend/resume.  I'll do this at the
 very head of the random tree, and make sure it gets pushed to Linus
 early during the next merge window.

 Does that sound like a plan?  Or does someone want to suggest
 something different?  I'm flexible...

OK.  Here's a proposal.  I'll split the series into two parts.  The
first part will be the arch_random_init generic change and code to
call it after suspend/resume (once I figure out the right callback).
I'll send that to you.

The second part will be the KVM and x86 code, which will look just
like this version (v5) except for the rename.  If needed, hpa and I
can hash the details we need at KS.

As for doing arch_random_init after clone/migration, I think we'll
need another KVM extension for that, since, AFAIK, we don't actually
get notified that we were cloned or migrated.  That will be
nontrivial.  Maybe we can figure that out at KS, too.

--Andy


 - Ted



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread H. Peter Anvin

On 08/13/2014 11:33 AM, Andy Lutomirski wrote:
 
 As for doing arch_random_init after clone/migration, I think we'll
 need another KVM extension for that, since, AFAIK, we don't actually
 get notified that we were cloned or migrated.  That will be
 nontrivial.  Maybe we can figure that out at KS, too.
 

We don't need a reset when migrated (although it might be a good idea
under some circumstances, i.e. if the pools might somehow have gotten
exposed) but definitely when cloned.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] x86: Reset MTRR on vCPU reset

2014-08-13 Thread Alex Williamson

The SDM specifies (June 2014 Vol3 11.11.5):

On a hardware reset, the P6 and more recent processors clear the
valid flags in variable-range MTRRs and clear the E flag in the
IA32_MTRR_DEF_TYPE MSR to disable all MTRRs. All other bits in the
MTRRs are undefined.

We currently do none of that, so whatever MTRR settings you had prior
to reset is what you have after reset.  Usually this doesn't matter
because KVM often ignores the guest mappings and uses write-back
anyway.  However, if you have an assigned device and an IOMMU that
allows NoSnoop for that device, KVM defers to the guest memory
mappings which are now stale after reset.  The result is that OVMF
rebooting on such a configuration takes a full minute to LZMA
decompress the EFI volume, a process that is nearly instant on the
initial boot.

Add support for reseting the SDM defined bits on vCPU reset.

Also, by my count we're already in danger of overflowing the entries
array that we pass to KVM, so I've topped it up for a bit of headroom.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Cc: qemu-sta...@nongnu.org
---

 target-i386/cpu.c |6 ++
 target-i386/cpu.h |4 
 target-i386/kvm.c |   14 +-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 6d008ab..b5ae654 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2588,6 +2588,12 @@ static void x86_cpu_reset(CPUState *s)
 
 env-xcr0 = 1;
 
+/* MTRR init - Clear global enable bit and valid bit in each variable reg 
*/
+env-mtrr_deftype = ~MSR_MTRRdefType_Enable;
+for (i = 0; i  MSR_MTRRcap_VCNT; i++) {
+env-mtrr_var[i].mask = ~MSR_MTRRphysMask_Valid;
+}
+
 #if !defined(CONFIG_USER_ONLY)
 /* We hard-wire the BSP to the first CPU. */
 if (s-cpu_index == 0) {
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index e634d83..139890f 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -337,6 +337,8 @@
 #define MSR_MTRRphysBase(reg)   (0x200 + 2 * (reg))
 #define MSR_MTRRphysMask(reg)   (0x200 + 2 * (reg) + 1)
 
+#define MSR_MTRRphysMask_Valid (1  11)
+
 #define MSR_MTRRfix64K_00x250
 #define MSR_MTRRfix16K_80x258
 #define MSR_MTRRfix16K_A0x259
@@ -353,6 +355,8 @@
 
 #define MSR_MTRRdefType 0x2ff
 
+#define MSR_MTRRdefType_Enable (1  11)
+
 #define MSR_CORE_PERF_FIXED_CTR00x309
 #define MSR_CORE_PERF_FIXED_CTR10x30a
 #define MSR_CORE_PERF_FIXED_CTR20x30b
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 097fe11..cb31338 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -79,6 +79,7 @@ static int lm_capable_kernel;
 static bool has_msr_hv_hypercall;
 static bool has_msr_hv_vapic;
 static bool has_msr_hv_tsc;
+static bool has_msr_mtrr;
 
 static bool has_msr_architectural_pmu;
 static uint32_t num_architectural_pmu_counters;
@@ -739,6 +740,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
 env-kvm_xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
 }
 
+if (env-features[FEAT_1_EDX]  CPUID_MTRR) {
+has_msr_mtrr = true;
+}
+
 return 0;
 }
 
@@ -1183,7 +1188,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 CPUX86State *env = cpu-env;
 struct {
 struct kvm_msrs info;
-struct kvm_msr_entry entries[100];
+struct kvm_msr_entry entries[128];
 } msr_data;
 struct kvm_msr_entry *msrs = msr_data.entries;
 int n = 0, i;
@@ -1278,6 +1283,13 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 kvm_msr_entry_set(msrs[n++], HV_X64_MSR_REFERENCE_TSC,
   env-msr_hv_tsc);
 }
+if (has_msr_mtrr) {
+kvm_msr_entry_set(msrs[n++], MSR_MTRRdefType, env-mtrr_deftype);
+for (i = 0; i  MSR_MTRRcap_VCNT; i++) {
+kvm_msr_entry_set(msrs[n++],
+  MSR_MTRRphysMask(i), env-mtrr_var[i].mask);
+}
+}
 
 /* Note: MSR_IA32_FEATURE_CONTROL is written separately, see
  *   kvm_put_msr_feature_control. */

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] x86: Reset MTRR on vCPU reset

2014-08-13 Thread Laszlo Ersek

a number of comments -- feel free to address or ignore each as you see fit:

On 08/13/14 21:09, Alex Williamson wrote:
 The SDM specifies (June 2014 Vol3 11.11.5):
 
 On a hardware reset, the P6 and more recent processors clear the
 valid flags in variable-range MTRRs and clear the E flag in the
 IA32_MTRR_DEF_TYPE MSR to disable all MTRRs. All other bits in the
 MTRRs are undefined.
 
 We currently do none of that, so whatever MTRR settings you had prior
 to reset is what you have after reset.  Usually this doesn't matter
 because KVM often ignores the guest mappings and uses write-back
 anyway.  However, if you have an assigned device and an IOMMU that
 allows NoSnoop for that device, KVM defers to the guest memory
 mappings which are now stale after reset.  The result is that OVMF
 rebooting on such a configuration takes a full minute to LZMA
 decompress the EFI volume, a process that is nearly instant on the

For pedantry, instead of EFI volume we could say LZMA-compressed
Firmware File System file in the FVMAIN_COMPACT firmware volume.

 initial boot.
 
 Add support for reseting the SDM defined bits on vCPU reset.
 
 Also, by my count we're already in danger of overflowing the entries
 array that we pass to KVM, so I've topped it up for a bit of headroom.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 Cc: qemu-sta...@nongnu.org
 ---
 
  target-i386/cpu.c |6 ++
  target-i386/cpu.h |4 
  target-i386/kvm.c |   14 +-
  3 files changed, 23 insertions(+), 1 deletion(-)
 
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 6d008ab..b5ae654 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -2588,6 +2588,12 @@ static void x86_cpu_reset(CPUState *s)
  
  env-xcr0 = 1;
  
 +/* MTRR init - Clear global enable bit and valid bit in each variable 
 reg */
 +env-mtrr_deftype = ~MSR_MTRRdefType_Enable;
 +for (i = 0; i  MSR_MTRRcap_VCNT; i++) {
 +env-mtrr_var[i].mask = ~MSR_MTRRphysMask_Valid;
 +}
 +

I can see that the limit, MSR_MTRRcap_VCNT, is #defined as 8. Would you
be willing to update the definition of the CPUX86State.mtrr_var array
too, in target-i386/cpu.h? Currently it says:

MTRRVar mtrr_var[8];

  #if !defined(CONFIG_USER_ONLY)
  /* We hard-wire the BSP to the first CPU. */
  if (s-cpu_index == 0) {
 diff --git a/target-i386/cpu.h b/target-i386/cpu.h
 index e634d83..139890f 100644
 --- a/target-i386/cpu.h
 +++ b/target-i386/cpu.h
 @@ -337,6 +337,8 @@
  #define MSR_MTRRphysBase(reg)   (0x200 + 2 * (reg))
  #define MSR_MTRRphysMask(reg)   (0x200 + 2 * (reg) + 1)
  
 +#define MSR_MTRRphysMask_Valid (1  11)
 +

Note: a signed integer (int32_t).

  #define MSR_MTRRfix64K_00x250
  #define MSR_MTRRfix16K_80x258
  #define MSR_MTRRfix16K_A0x259
 @@ -353,6 +355,8 @@
  
  #define MSR_MTRRdefType 0x2ff
  
 +#define MSR_MTRRdefType_Enable (1  11)
 +

Note: a signed integer (int32_t).

Now, if you scroll back to the bit-clearing in x86_cpu_reset(), you see

  ~MSR_MTRRdefType_Enable

and

 ~MSR_MTRRphysMask_Valid

These expressions evaluate to negative int (int32_t) values (because the
bit-neg sets their sign bits).

Due to two's complement (which we are allowed to assume in qemu, see
HACKING), the negative int32_t values will be just correct for the next
step, when they are converted to uint64_t for the bit-ands, as part of
the usual arithmetic conversions. (env-mtrr_deftype and
env-mtrr_var[i].mask are uint64_t.) Mathematically this means an
addition of UINT64_MAX+1. (Sign extended.)

In general, even though they are correct due to two's complement, I
dislike such detours into negative-valued signed integers by way of
bit-neg, because people are mostly unaware of them and assume they just
work. My preferred solution would be

#define MSR_MTRRphysMask_Valid (1ull  11)
#define MSR_MTRRdefType_Enable (1ull  11)

Feel free to ignore this of course.

  #define MSR_CORE_PERF_FIXED_CTR00x309
  #define MSR_CORE_PERF_FIXED_CTR10x30a
  #define MSR_CORE_PERF_FIXED_CTR20x30b
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 097fe11..cb31338 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -79,6 +79,7 @@ static int lm_capable_kernel;
  static bool has_msr_hv_hypercall;
  static bool has_msr_hv_vapic;
  static bool has_msr_hv_tsc;
 +static bool has_msr_mtrr;
  
  static bool has_msr_architectural_pmu;
  static uint32_t num_architectural_pmu_counters;
 @@ -739,6 +740,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
  env-kvm_xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
  }
  
 +if (env-features[FEAT_1_EDX]  CPUID_MTRR) {
 +has_msr_mtrr = true;
 +}
 +

Seems to match MTRR Feature Identification in my (older) copy of the SDM.

  return 0;
  }
  
 @@ -1183,7 +1188,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
  CPUX86State *env = cpu-env;

Re: Logging Information

2014-08-13 Thread Shiva V

Hello,
 I am exploring ideas for clients in cloud to be able to implement functions 
where there could verify the services offered by the cloud provider like 
metering services.

Idea is I am using the concept of write execute protection scheme. And also I 
am using TamperEvident Log. I am making use of WP bit to protect pagetable 
entries so that any modifications is captured in the log. Code pages of the 
log are also read only and hence any modifications to it is also captured.

My questions are:
What are the important events that one needs to log so that one could have 
reasonable overhead? Currently, I have large overhead since any update to 
page table/modifications creates a trap and in cloud, this is huge.

How can one create tamperevident logging mechanism? How could client and the 
provider verify that each events are logged as intended without a miss.

How can one create a logging mechanism (say per client basis). In that case, 
if required we could replay the log so that we could capture the malicious 
event.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

What to log in case of untrusted hypervisor

2014-08-13 Thread Shiva V

Hello,
I am working on testbed executing some secure applications on untrusted 
hypervisor (in my case kvm). 

In order to verify the run time integrity of applications,I am using an idea 
based on write xor execute protection protecting any of the page table 
updates of hypervisoruser code/data using WP bit making it read only.
I am capturing the request in the handler,temporarily making it write,log and 
then make it read only again.

I am also using tamper-evident logging mechanism to log any events related to 
it.

I have a few questions.

1. What are the ideal events that one needs to log so that if one needs to 
replay the log,he can do so to verify.

2. How can one create tamper-evident logging mechanism? How could client and 
the provider verify that each events are logged as intended without a miss. 

3.How can one create a logging mechanism (say per client basis). In that 
case, if required we could replay the log so that we could capture the 
malicious event.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] x86: Reset MTRR on vCPU reset

2014-08-13 Thread Alex Williamson

On Wed, 2014-08-13 at 22:33 +0200, Laszlo Ersek wrote:
 a number of comments -- feel free to address or ignore each as you see fit:
 
 On 08/13/14 21:09, Alex Williamson wrote:
  The SDM specifies (June 2014 Vol3 11.11.5):
  
  On a hardware reset, the P6 and more recent processors clear the
  valid flags in variable-range MTRRs and clear the E flag in the
  IA32_MTRR_DEF_TYPE MSR to disable all MTRRs. All other bits in the
  MTRRs are undefined.
  
  We currently do none of that, so whatever MTRR settings you had prior
  to reset is what you have after reset.  Usually this doesn't matter
  because KVM often ignores the guest mappings and uses write-back
  anyway.  However, if you have an assigned device and an IOMMU that
  allows NoSnoop for that device, KVM defers to the guest memory
  mappings which are now stale after reset.  The result is that OVMF
  rebooting on such a configuration takes a full minute to LZMA
  decompress the EFI volume, a process that is nearly instant on the
 
 For pedantry, instead of EFI volume we could say LZMA-compressed
 Firmware File System file in the FVMAIN_COMPACT firmware volume.

Can you come up with something with maybe half that many words?  And
also, does it matter?  I want someone using OVMF and experiencing a long
reboot delay to know that this might fix their problem.  Noting that the
major time consuming stall is in the LZMA decompression code helps to
rationalize why the mapping change is important.  The specific blob of
data that's being decompressed seems mostly irrelevant, which is why I
only gave it 2 words.

  initial boot.
  
  Add support for reseting the SDM defined bits on vCPU reset.
  
  Also, by my count we're already in danger of overflowing the entries
  array that we pass to KVM, so I've topped it up for a bit of headroom.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  Cc: qemu-sta...@nongnu.org
  ---
  
   target-i386/cpu.c |6 ++
   target-i386/cpu.h |4 
   target-i386/kvm.c |   14 +-
   3 files changed, 23 insertions(+), 1 deletion(-)
  
  diff --git a/target-i386/cpu.c b/target-i386/cpu.c
  index 6d008ab..b5ae654 100644
  --- a/target-i386/cpu.c
  +++ b/target-i386/cpu.c
  @@ -2588,6 +2588,12 @@ static void x86_cpu_reset(CPUState *s)
   
   env-xcr0 = 1;
   
  +/* MTRR init - Clear global enable bit and valid bit in each variable 
  reg */
  +env-mtrr_deftype = ~MSR_MTRRdefType_Enable;
  +for (i = 0; i  MSR_MTRRcap_VCNT; i++) {
  +env-mtrr_var[i].mask = ~MSR_MTRRphysMask_Valid;
  +}
  +
 
 I can see that the limit, MSR_MTRRcap_VCNT, is #defined as 8. Would you
 be willing to update the definition of the CPUX86State.mtrr_var array
 too, in target-i386/cpu.h? Currently it says:

I was tempted to do that, but I was hoping there was some deeper
reasoning why these were already defined separately.  For instance, what
if we wanted to keep a stable vmstate size, but expose fewer variable
MTRRs to the guest.  MSR_MTRRcap_VCNT is the number exposed to the
guest, so it makes sense that we only need to clear the valid bits on
those.  As I look through the commits that got us here, that was
probably just wishful thinking.

 MTRRVar mtrr_var[8];
 
   #if !defined(CONFIG_USER_ONLY)
   /* We hard-wire the BSP to the first CPU. */
   if (s-cpu_index == 0) {
  diff --git a/target-i386/cpu.h b/target-i386/cpu.h
  index e634d83..139890f 100644
  --- a/target-i386/cpu.h
  +++ b/target-i386/cpu.h
  @@ -337,6 +337,8 @@
   #define MSR_MTRRphysBase(reg)   (0x200 + 2 * (reg))
   #define MSR_MTRRphysMask(reg)   (0x200 + 2 * (reg) + 1)
   
  +#define MSR_MTRRphysMask_Valid (1  11)
  +
 
 Note: a signed integer (int32_t).
 
   #define MSR_MTRRfix64K_00x250
   #define MSR_MTRRfix16K_80x258
   #define MSR_MTRRfix16K_A0x259
  @@ -353,6 +355,8 @@
   
   #define MSR_MTRRdefType 0x2ff
   
  +#define MSR_MTRRdefType_Enable (1  11)
  +
 
 Note: a signed integer (int32_t).
 
 Now, if you scroll back to the bit-clearing in x86_cpu_reset(), you see
 
   ~MSR_MTRRdefType_Enable
 
 and
 
  ~MSR_MTRRphysMask_Valid
 
 These expressions evaluate to negative int (int32_t) values (because the
 bit-neg sets their sign bits).
 
 Due to two's complement (which we are allowed to assume in qemu, see
 HACKING), the negative int32_t values will be just correct for the next
 step, when they are converted to uint64_t for the bit-ands, as part of
 the usual arithmetic conversions. (env-mtrr_deftype and
 env-mtrr_var[i].mask are uint64_t.) Mathematically this means an
 addition of UINT64_MAX+1. (Sign extended.)
 
 In general, even though they are correct due to two's complement, I
 dislike such detours into negative-valued signed integers by way of
 bit-neg, because people are mostly unaware of them and assume they just
 work. My preferred solution would be
 
 #define MSR_MTRRphysMask_Valid (1ull  11)
 #define

Re: [PATCH] x86: Reset MTRR on vCPU reset

2014-08-13 Thread Laszlo Ersek

On 08/14/14 00:06, Alex Williamson wrote:
 On Wed, 2014-08-13 at 22:33 +0200, Laszlo Ersek wrote:
 a number of comments -- feel free to address or ignore each as you see fit:

 On 08/13/14 21:09, Alex Williamson wrote:

 mappings which are now stale after reset.  The result is that OVMF
 rebooting on such a configuration takes a full minute to LZMA
 decompress the EFI volume, a process that is nearly instant on the

 For pedantry, instead of EFI volume we could say LZMA-compressed
 Firmware File System file in the FVMAIN_COMPACT firmware volume.
 
 Can you come up with something with maybe half that many words?

Firmware volume then. Firmware volume is not a generic term, it's a
specific term in the Platform Initialization (PI) spec.

 And
 also, does it matter?

No. :)

 I want someone using OVMF and experiencing a long
 reboot delay to know that this might fix their problem.  Noting that the
 major time consuming stall is in the LZMA decompression code helps to
 rationalize why the mapping change is important.  The specific blob of
 data that's being decompressed seems mostly irrelevant, which is why I
 only gave it 2 words.

Fair enough, it's just that EFI volume doesn't mean anything specific
(to me), while firmware volume does.

 @@ -1183,7 +1188,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
  CPUX86State *env = cpu-env;
  struct {
  struct kvm_msrs info;
 -struct kvm_msr_entry entries[100];
 +struct kvm_msr_entry entries[128];
  } msr_data;
  struct kvm_msr_entry *msrs = msr_data.entries;
  int n = 0, i;
 @@ -1278,6 +1283,13 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
  kvm_msr_entry_set(msrs[n++], HV_X64_MSR_REFERENCE_TSC,
env-msr_hv_tsc);
  }
 +if (has_msr_mtrr) {
 +kvm_msr_entry_set(msrs[n++], MSR_MTRRdefType, 
 env-mtrr_deftype);
 +for (i = 0; i  MSR_MTRRcap_VCNT; i++) {
 +kvm_msr_entry_set(msrs[n++],
 +  MSR_MTRRphysMask(i), 
 env-mtrr_var[i].mask);
 +}
 +}
  
  /* Note: MSR_IA32_FEATURE_CONTROL is written separately, see
   *   kvm_put_msr_feature_control. */


 I think that this code is correct (and sufficient for the reset
 problem), but I'm uncertain if it's complete:

 (a) Shouldn't you put the matching PhysBase registers as well (for the
 variable range ones)?

 Plus, shouldn't you put mtrr_fixed[11] too (MSR_MTRRfix64K_0, ...)?
 
 If my change wasn't isolated to the reset portion of kvm_put_msrs() then
 I would agree with you.  But since it is, all of those registers are
 undefined by the SDM.

That's a good way to express your point indeed, and a good way to
formulate my concern: I'm not sure your change is isolated to the reset
portion. The check that gates the new hunk says

  level = KVM_PUT_RESET_STATE

and a higher level than that does exist: KVM_PUT_FULL_STATE, which is
used in incoming migration.

 (b) You only modify kvm_put_msrs(). What about kvm_get_msrs()? I can see
 that you make the msr putting dependent on:

 /*
  * The following MSRs have side effects on the guest or are too
  * heavy for normal writeback. Limit them to reset or full state
  * updates.
  */
 if (level = KVM_PUT_RESET_STATE) {

 But that's probably not your reason for omitting matching new code from
 kvm_get_msrs(): HV_X64_MSR_REFERENCE_TSC is also heavy-weight (visible
 in your patch's context), but that one is nevertheless handled in
 kvm_get_msrs().

 My only reason for (b) is simply symmetry. For example, commit 48a5f3bc
 added HV_X64_MSR_REFERENCE_TSC at once to both put() and get().

 According to target-i386/machine.c, mtrr_deftype and co. are even
 migrated (part of vmstate), so this asymmetry could become a problem in
 migration. Eg. source host doesn't fetch MTRR state from KVM, hence wire
 format carries garbage, but on the target you put (part of) that garbage
 (right now, just the mask) back into KVM:

 do_savevm()
   qemu_savevm_state()
 qemu_savevm_state_complete()
   cpu_synchronize_all_states()
 cpu_synchronize_state()
   kvm_cpu_synchronize_state()
 do_kvm_cpu_synchronize_state()
   kvm_arch_get_registers()
 kvm_get_msrs()

 do_loadvm()
   load_vmstate()
 qemu_loadvm_state()
   cpu_synchronize_all_post_init()
 cpu_synchronize_post_init()
   kvm_cpu_synchronize_post_init()
 kvm_arch_put_registers(..., KVM_PUT_FULL_STATE)
   kvm_put_msrs(..., KVM_PUT_FULL_STATE)

 /* state subset modified during VCPU reset */
 #define KVM_PUT_RESET_STATE 2

 /* full state set, modified during initialization or on vmload */
 #define KVM_PUT_FULL_STATE  3

 Hence I suspect (a) and (b) should be handled.

 ... And then we arrive at cross-version migration, where both source and
 target hosts support MTRR, but the source qemu sends

Re: [PATCH v2] KVM: x86: check ISR and TMR to construct eoi exit bitmap

2014-08-13 Thread Wanpeng Li

Hi Wei,
On Thu, Aug 14, 2014 at 03:16:25AM +0800, Wei Wang wrote:
From: Yang Zhang yang.z.zh...@intel.com

Guest may mask the IOAPIC entry before issue EOI. In such case,
EOI will not be intercepted by hypervisor due to the corrensponding
bit in eoi exit bitmap is not setting.

The solution is to check ISR + TMR to construct the EOI exit bitmap.

This patch is a better fixing for the issue that commit 0f6c0a740b
tries to solve.


I think you miss the changlog.

Regards,
Wanpeng Li 

Tested-by: Alex Williamson alex.william...@redhat.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
Signed-off-by: Wei Wang wei.w.w...@intel.com
---
 arch/x86/kvm/lapic.c |   17 +
 arch/x86/kvm/lapic.h |2 ++
 arch/x86/kvm/x86.c   |9 +
 virt/kvm/ioapic.c|7 ---
 4 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 08e8a89..0ed4bcb 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -515,6 +515,23 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
   __clear_bit(KVM_APIC_PV_EOI_PENDING, vcpu-arch.apic_attention);
 }
 
+void kvm_apic_zap_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
+  u32 *tmr)
+{
+  u32 i, reg_off, intr_in_service;
+  struct kvm_lapic *apic = vcpu-arch.apic;
+
+  for (i = 0; i  8; i++) {
+  reg_off = 0x10 * i;
+  intr_in_service = apic_read_reg(apic, APIC_ISR + reg_off) 
+  kvm_apic_get_reg(apic, APIC_TMR + reg_off);
+  if (intr_in_service) {
+  *((u32 *)eoi_exit_bitmap + i) |= intr_in_service;
+  tmr[i] |= intr_in_service;
+  }
+  }
+}
+
 void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr)
 {
   struct kvm_lapic *apic = vcpu-arch.apic;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 6a11845..4ee3d70 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -53,6 +53,8 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
 u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
 void kvm_apic_set_version(struct kvm_vcpu *vcpu);
 
+void kvm_apic_zap_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
+  u32 *tmr);
 void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr);
 void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
 int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 204422d..755b556 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6005,6 +6005,15 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
   memset(tmr, 0, 32);
 
   kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr);
+  /*
+   * Guest may mask the IOAPIC entry before issue EOI. In such case,
+   * EOI will not be intercepted by hypervisor due to the corrensponding
+   * bit in eoi exit bitmap is not setting.
+   *
+   * The solution is to check ISR + TMR to construct the EOI exit bitmap.
+   */
+  kvm_apic_zap_eoi_exitmap(vcpu, eoi_exit_bitmap, tmr);
+
   kvm_x86_ops-load_eoi_exitmap(vcpu, eoi_exit_bitmap);
   kvm_apic_update_tmr(vcpu, tmr);
 }
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index e8ce34c..2458a1d 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -254,9 +254,10 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 
*eoi_exit_bitmap,
   spin_lock(ioapic-lock);
   for (index = 0; index  IOAPIC_NUM_PINS; index++) {
   e = ioapic-redirtbl[index];
-  if (e-fields.trig_mode == IOAPIC_LEVEL_TRIG ||
-  kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, 
index) ||
-  index == RTC_GSI) {
+  if (!e-fields.mask 
+  (e-fields.trig_mode == IOAPIC_LEVEL_TRIG ||
+   kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC,
+   index) || index == RTC_GSI)) {
   if (kvm_apic_match_dest(vcpu, NULL, 0,
   e-fields.dest_id, e-fields.dest_mode)) {
   __set_bit(e-fields.vector,
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] x86: Reset MTRR on vCPU reset

2014-08-13 Thread Laszlo Ersek

On 08/14/14 01:17, Laszlo Ersek wrote:

 - With KVM, the lack of loading MTRR state from KVM, combined with the
   (partial) storing of MTRR state to KVM, has two consequences:
   - migration invalidates (loses) MTRR state,

I'll concede that migration *already* loses MTRR state (on KVM), even
before your patch. On the incoming host, the difference is that
pre-patch, the guest continues running (after migration) with MTRRs in
the initial KVM state, while post-patch, the guest continues running
after an explicit zeroing of the variable MTRR masks and the deftype.

I admit that it wouldn't be right to say that the patch causes MTRR
state loss.

With that, I think I've actually convinced myself that your patch is
correct:

The x86_cpu_reset() hunk is correct in any case, independently of KVM
vs. TCG. (On TCG it even improves MTRR conformance.) Splitting that hunk
into a separate patch might be worthwhile, but not overly important.

The kvm_put_msrs() hunk forces a zero write to the variable MTRR
PhysMasks and the DefType, on both reset and on incoming migration. For
reset, this is correct behavior. For incoming migration, it is not, but
it certainly shouldn't qualify as a regression, relative to the current
status (where MTRR state is simply lost and replaced with initial MTRR
state on the incoming host).

I think the above end results could be expressed more clearly in the
code, but I'm already wondering if you'll ever talk to me again, so I'm
willing to give my R-b if you think that's useful... :)

(Again, I might be wrong, easily.)

Thanks
Laszlo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support

2014-08-13 Thread Mario Smarduch

On 08/13/2014 12:30 AM, Christoffer Dall wrote:
 On Tue, Aug 12, 2014 at 06:27:11PM -0700, Mario Smarduch wrote:
 On 08/12/2014 02:50 AM, Christoffer Dall wrote:
 On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
 On 08/11/2014 12:13 PM, Christoffer Dall wrote:
 On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
 
 [...]
 
 @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, 
 gpa_t gpa, void *data)
  {
  pte_t *pte = (pte_t *)data;
  
 -stage2_set_pte(kvm, NULL, gpa, pte, false);
 +stage2_set_pte(kvm, NULL, gpa, pte, false, false);

 why is logging never active if we are called from MMU notifiers?

 mmu notifiers update sptes, but I don't see how these updates
 can result in guest dirty pages. Also guest pages are marked dirty
 from 2nd stage page fault handlers (searching through the code).

 Ok, then add:

 /*
  * We can always call stage2_set_pte with logging_active == false,
  * because MMU notifiers will have unmapped a huge PMD before calling
  * -change_pte() (which in turn calls kvm_set_spte_hva()) and therefore
  * stage2_set_pte() never needs to clear out a huge PMD through this
  * calling path.
  */

 So here on permission change to primary pte's kernel first invalidates
 related s2ptes followed by -change_pte calls to synchronize s2ptes. As
 consequence of invalidation we unmap huge PMDs, if a page falls in that
 range.

 Is the comment to point out use of logging flag under various scenarios?
 
 The comment is because when you look at this function it is not obvious
 why we pass logging_active=false, despite logging may actually be
 active.  This could suggest that the parameter to stage2_set_pte()
 should be named differently (break_huge_pmds) or something like that,
 but we can also be satisfied with the comment.

Ok I see, I was thinking you thought it was breaking something.
Yeah I'll add the comment, in reality this is another use case
where a PMD may need to be converted to page table so it makes sense
to contrast use cases.

 

 Should I add comments on flag use in other places as well?

 
 It's always a judgement call.  I didn't find it necessarry to put a
 comment elsewhere because I think it's pretty obivous that we would
 never care about logging writes to device regions.
 
 However, this made me think, are we making sure that we are not marking
 device mappings as read-only in the wp_range functions?  I think it's

KVM_SET_USER_MEMORY_REGION ioctl doesn't check type of region being
installed/operated on (KVM_MEM_LOG_DIRTY_PAGES), in case of QEMU
these regions wind up in KVMState-KVMSlot[], when
memory_region_add_subregion() is called KVM listener installs it.
For migration and dirty page logging QEMU walks the KVMSlot[] array.

For QEMU VFIO (PCI) mmap()ing BAR of type IORESOURCE_MEM,
causes the memory region to be added to KVMState-KVMSlot[].
In that case it's possible to walk KVMState-KVMSlot[] issue
the ioctl and  come across  a device mapping with normal memory and
WP it's s2ptes (VFIO sets unmigrateble state though).

But I'm not sure what's there to stop someone calling the ioctl and
install a region with device memory type. Most likely though if you
installed that kind of region migration would be disabled.

But just for logging use not checking memory type could be an issue.

 quite bad if we mark the VCPU interface as read-only for example.
 
 -Christoffer
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread H. Peter Anvin

On 08/13/2014 11:44 AM, H. Peter Anvin wrote:
 On 08/13/2014 11:33 AM, Andy Lutomirski wrote:

 As for doing arch_random_init after clone/migration, I think we'll
 need another KVM extension for that, since, AFAIK, we don't actually
 get notified that we were cloned or migrated.  That will be
 nontrivial.  Maybe we can figure that out at KS, too.

 
 We don't need a reset when migrated (although it might be a good idea
 under some circumstances, i.e. if the pools might somehow have gotten
 exposed) but definitely when cloned.
 

But yes, we need a notification.  For obvious reasons there is no
suspend event (one can snapshot a running VM) but we need to be notified
upon wakeup, *or* we need to give KVM a way to update the necessary state.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: The status about vhost-net on kvm-arm?

2014-08-13 Thread Li Liu



On 2014/8/13 19:25, Nikolay Nikolaev wrote:
 On Wed, Aug 13, 2014 at 12:10 PM, Nikolay Nikolaev
 n.nikol...@virtualopensystems.com wrote:
 On Tue, Aug 12, 2014 at 6:47 PM, Nikolay Nikolaev
 n.nikol...@virtualopensystems.com wrote:

 Hello,


 On Tue, Aug 12, 2014 at 5:41 AM, Li Liu john.li...@huawei.com wrote:

 Hi all,

 Is anyone there can tell the current status of vhost-net on kvm-arm?

 Half a year has passed from Isa Ansharullah asked this question:
 http://www.spinics.net/lists/kvm-arm/msg08152.html

 I have found two patches which have provided the kvm-arm support of
 eventfd and irqfd:

 1) [RFC PATCH 0/4] ARM: KVM: Enable the ioeventfd capability of KVM on ARM
 http://lists.gnu.org/archive/html/qemu-devel/2014-01/msg01770.html

 2) [RFC,v3] ARM: KVM: add irqfd and irq routing support
 https://patches.linaro.org/32261/

 And there's a rough patch for qemu to support eventfd from Ying-Shiuan Pan:

 [Qemu-devel] [PATCH 0/4] ioeventfd support for virtio-mmio
 https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html

 But there no any comments of this patch. And I can found nothing about qemu
 to support irqfd. Do I lost the track?

 If nobody try to fix it. We have a plan to complete it about virtio-mmio
 supporing irqfd and multiqueue.



 we at Virtual Open Systems did some work and tested vhost-net on ARM
 back in March.
 The setup was based on:
  - host kernel with our ioeventfd patches:
 http://www.spinics.net/lists/kvm-arm/msg08413.html

 - qemu with the aforementioned patches from Ying-Shiuan Pan
 https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html

 The testbed was ARM Chromebook with Exynos 5250, using a 1Gbps USB3
 Ethernet adapter connected to a 1Gbps switch. I can't find the actual
 numbers but I remember that with multiple streams the gain was clearly
 seen. Note that it used the minimum required ioventfd implementation
 and not irqfd.

 I guess it is feasible to think that it all can be put together and
 rebased + the recent irqfd work. One can achiev even better
 performance (because of the irqfd).


 Managed to replicate the setup with the old versions e used in March:

 Single stream from another machine to chromebook with 1Gbps USB3
 Ethernet adapter.
 iperf -c address -P 1 -i 1 -p 5001 -f k -t 10
 to HOST: 858316 Kbits/sec
 to GUEST: 761563 Kbits/sec
 to GUEST vhost=off: 508150 Kbits/sec

 10 parallel streams
 iperf -c address -P 10 -i 1 -p 5001 -f k -t 10
 to HOST: 842420 Kbits/sec
 to GUEST: 625144 Kbits/sec
 to GUEST vhost=off: 425276 Kbits/sec

I have tested the same cases on a Hisilicon board (Cortex-A15@1G)
with Integrated 1Gbps Ethernet adapter.

iperf -c address -P 1 -i 1 -p 5001 -f M -t 10
to HOST: 906 Mbits/sec
to GUEST: 562 Mbits/sec
to GUEST vhost=off: 340 Mbits/sec

10 parallel streams, the performance gets 10% plus:
iperf -c address -P 10 -i 1 -p 5001 -f M -t 10
to HOST: 923 Mbits/sec
to GUEST: 592 Mbits/sec
to GUEST vhost=off: 364 Mbits/sec

I't easy to see vhost-net brings great performance improvements,
almost 50%+.

Li.








 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


 regards,
 Nikolay Nikolaev
 Virtual Open Systems
 
 .
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] PC, KVM, CMA: Fix regression caused by wrong get_order() use

2014-08-13 Thread Alexey Kardashevskiy

fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no
functional change but this is not true as it calls get_order() (which
takes bytes) where it should have called ilog2() and the kernel stops
on VM_BUG_ON().

This replaces get_order() with order_base_2() (round-up version of ilog2).

Suggested-by: Paul Mackerras pau...@samba.org
Cc: Alexander Graf ag...@suse.de
Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Cc: Joonsoo Kim iamjoonsoo@lge.com
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---

Changes:
v2:
* s/ilog2/order_base_2/
* removed cc: sta...@vger.kernel.org as I got wrong impression that v3.16 is
broken

---
 arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 329d7fd..b9615ba 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma()
ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL);
if (!ri)
return NULL;
-   page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages));
+   page = cma_alloc(kvm_cma, kvm_rma_pages, order_base_2(kvm_rma_pages));
if (!page)
goto err_out;
atomic_set(ri-use_count, 1);
@@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
 {
unsigned long align_pages = HPT_ALIGN_PAGES;
 
-   VM_BUG_ON(get_order(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
+   VM_BUG_ON(order_base_2(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
 
/* Old CPUs require HPT aligned on a multiple of its size */
if (!cpu_has_feature(CPU_FTR_ARCH_206))
align_pages = nr_pages;
-   return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
+   return cma_alloc(kvm_cma, nr_pages, order_base_2(align_pages));
 }
 EXPORT_SYMBOL_GPL(kvm_alloc_hpt);
 
-- 
2.0.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] PC, KVM, CMA: Fix regression caused by wrong get_order() use

2014-08-13 Thread Aneesh Kumar K.V

Alexey Kardashevskiy a...@ozlabs.ru writes:

 fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no
 functional change but this is not true as it calls get_order() (which
 takes bytes) where it should have called ilog2() and the kernel stops
 on VM_BUG_ON().

 This replaces get_order() with order_base_2() (round-up version of ilog2).

 Suggested-by: Paul Mackerras pau...@samba.org
 Cc: Alexander Graf ag...@suse.de
 Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 Cc: Joonsoo Kim iamjoonsoo@lge.com
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

 ---

 Changes:
 v2:
 * s/ilog2/order_base_2/
 * removed cc: sta...@vger.kernel.org as I got wrong impression that v3.16 is
 broken

 ---
  arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
 b/arch/powerpc/kvm/book3s_hv_builtin.c
 index 329d7fd..b9615ba 100644
 --- a/arch/powerpc/kvm/book3s_hv_builtin.c
 +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
 @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma()
   ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL);
   if (!ri)
   return NULL;
 - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages));
 + page = cma_alloc(kvm_cma, kvm_rma_pages, order_base_2(kvm_rma_pages));
   if (!page)
   goto err_out;
   atomic_set(ri-use_count, 1);
 @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
  {
   unsigned long align_pages = HPT_ALIGN_PAGES;

 - VM_BUG_ON(get_order(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
 + VM_BUG_ON(order_base_2(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);

   /* Old CPUs require HPT aligned on a multiple of its size */
   if (!cpu_has_feature(CPU_FTR_ARCH_206))
   align_pages = nr_pages;
 - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
 + return cma_alloc(kvm_cma, nr_pages, order_base_2(align_pages));
  }
  EXPORT_SYMBOL_GPL(kvm_alloc_hpt);

 -- 
 2.0.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread Andy Lutomirski

On Wed, Aug 13, 2014 at 7:41 PM, H. Peter Anvin h...@zytor.com wrote:
 On 08/13/2014 11:44 AM, H. Peter Anvin wrote:
 On 08/13/2014 11:33 AM, Andy Lutomirski wrote:

 As for doing arch_random_init after clone/migration, I think we'll
 need another KVM extension for that, since, AFAIK, we don't actually
 get notified that we were cloned or migrated.  That will be
 nontrivial.  Maybe we can figure that out at KS, too.


 We don't need a reset when migrated (although it might be a good idea
 under some circumstances, i.e. if the pools might somehow have gotten
 exposed) but definitely when cloned.


 But yes, we need a notification.  For obvious reasons there is no
 suspend event (one can snapshot a running VM) but we need to be notified
 upon wakeup, *or* we need to give KVM a way to update the necessary state.

This could presumably use the interrupt mechanism on virtio-rng if
we're willing to depend on having host support for virtio-rng.

v6 (coming in a few minutes) will at least get it right when the
kernel goes through the resume path (i.e. not KVM/QEMU suspend, and
maybe not S0ix either).

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-13 Thread Xiao Guangrong

On 08/13/2014 05:18 AM, David Matlack wrote:
 On Mon, Aug 11, 2014 at 10:02 PM, Xiao Guangrong
 xiaoguangr...@linux.vnet.ibm.com wrote:
 @@ -722,9 +719,10 @@ static struct kvm_memslots *install_new_memslots(struct 
 kvm *kvm,
  {
 struct kvm_memslots *old_memslots = kvm-memslots;

 
 I think you want
 
   slots-generation = old_memslots-generation;
 
 here.
 
 On the KVM_MR_DELETE path, install_new_memslots is called twice so this
 patch introduces a short window of time where the generation number
 actually decreases.

Yes, indeed. Thank you for pointing it out, will update.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v6 0/7] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread Andy Lutomirski

This introduces and uses a very simple synchronous mechanism to get
/dev/urandom-style bits appropriate for initial KVM PV guest RNG
seeding.

It also re-works the way that architectural random data is fed into
random.c's pools.  Timekeeping randomness now comes directly from
the timekeeping core rather than being pulled in from init_std_data,
and timekeeping randomness is added both on boot and on resume.  I
added a new arch hook called arch_rng_init.  The default
implementation is more or less the same as the current code, except
that random_get_entropy is now called unconditionally.  We now also
call init_std_data on resume.

x86 gets a custom arch_rng_init.  It will use KVM_GET_RNG_SEED if
available, and, if it does anything, it will log the number of bits
collected from each available architectural source.  If more
paravirt seed sources show up, it will be a natural place to add
them.

I sent the corresponding kvm-unit-tests and qemu changes separately.

Changes from v5:
 - Moved the generic changes to the beginning.
 - Renamed arch_get_rng_seed to arch_rng_init.
 - The timekeeping change is new.
 - random.c registers a syscore callback to reseed on resume.

Changes from v4:
 - Got rid of the RDRAND behavior change.  If this series is accepted,
   I may resend it separately, but I think it's an unrelated issue.
 - Fix up the changelog entries -- I misunderstood how the old code
   worked.
 - Avoid lots of failed attempts to use KVM_GET_RNG_SEED if it's not
   available.

Changes from v3:
 - Other than KASLR, the guest pieces are completely rewritten.
   Patches 2-4 have essentially nothing in common with v2.

Changes from v2:
 - Bisection fix (patch 2 had a misplaced brace).  The final states is
   identical to that of v2.
 - Improve the 0/5 description a little bit.

Changes from v1:
 - Split patches 2 and 3
 - Log all arch sources in init_std_data
 - Fix the 32-bit kaslr build

Andy Lutomirski (7):
  random: Add and use arch_rng_init
  random, timekeeping: Collect timekeeping entropy in the timekeeping
code
  random: Reseed pools on resume
  x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
  x86,random: Add an x86 implementation of arch_rng_init
  x86,random,kvm: Use KVM_GET_RNG_SEED in arch_rng_init
  x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

 Documentation/virtual/kvm/cpuid.txt  |  3 ++
 arch/x86/Kconfig |  4 ++
 arch/x86/boot/compressed/aslr.c  | 27 +
 arch/x86/include/asm/archrandom.h|  6 +++
 arch/x86/include/asm/kvm_guest.h |  9 +
 arch/x86/include/asm/processor.h | 21 --
 arch/x86/include/uapi/asm/kvm_para.h |  2 +
 arch/x86/kernel/Makefile |  2 +
 arch/x86/kernel/archrandom.c | 74 
 arch/x86/kernel/kvm.c| 10 +
 arch/x86/kvm/cpuid.c |  3 +-
 arch/x86/kvm/x86.c   |  4 ++
 drivers/char/random.c| 42 
 include/linux/random.h   | 40 +++
 kernel/time/timekeeping.c| 11 ++
 15 files changed, 246 insertions(+), 12 deletions(-)
 create mode 100644 arch/x86/kernel/archrandom.c

-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v6 7/7] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

2014-08-13 Thread Andy Lutomirski

It's considerably better than any of the alternatives on KVM.

Rather than reinventing all of the cpu feature query code, this fixes
native_cpuid to work in PIC objects.

I haven't combined it with boot/cpuflags.c's cpuid implementation:
including asm/processor.h from boot/cpuflags.c results in a flood of
unrelated errors, and fixing it might be messy.

Reviewed-by: Kees Cook keesc...@chromium.org
Acked-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/boot/compressed/aslr.c  | 27 +++
 arch/x86/include/asm/processor.h | 21 ++---
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index fc6091a..8583f0e 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -5,6 +5,8 @@
 #include asm/archrandom.h
 #include asm/e820.h
 
+#include uapi/asm/kvm_para.h
+
 #include generated/compile.h
 #include linux/module.h
 #include linux/uts.h
@@ -15,6 +17,22 @@
 static const char build_str[] = UTS_RELEASE  ( LINUX_COMPILE_BY @
LINUX_COMPILE_HOST ) ( LINUX_COMPILER )  UTS_VERSION;
 
+static bool kvm_para_has_feature(unsigned int feature)
+{
+   u32 kvm_base;
+   u32 features;
+
+   if (!has_cpuflag(X86_FEATURE_HYPERVISOR))
+   return false;
+
+   kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES);
+   if (!kvm_base)
+   return false;
+
+   features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES);
+   return features  (1UL  feature);
+}
+
 #define I8254_PORT_CONTROL 0x43
 #define I8254_PORT_COUNTER00x40
 #define I8254_CMD_READBACK 0xC0
@@ -81,6 +99,15 @@ static unsigned long get_random_long(void)
}
}
 
+   if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) {
+   u64 seed;
+
+   debug_putstr( MSR_KVM_GET_RNG_SEED);
+   rdmsrl(MSR_KVM_GET_RNG_SEED, seed);
+   random ^= (unsigned long)seed;
+   use_i8254 = false;
+   }
+
if (has_cpuflag(X86_FEATURE_TSC)) {
debug_putstr( RDTSC);
rdtscll(raw);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a4ea023..6096f3c 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -189,10 +189,25 @@ static inline int have_cpuid_p(void)
 static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx)
 {
-   /* ecx is often an input as well as an output. */
-   asm volatile(cpuid
+   /*
+* This function can be used from the boot code, so it needs
+* to avoid using EBX in constraints in PIC mode.
+*
+* ecx is often an input as well as an output.
+*/
+   asm volatile(.ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t
+movl  %%ebx,%1\n\t
+.endif ; .endif   \n\t
+cpuid \n\t
+.ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t
+xchgl %%ebx,%1\n\t
+.endif ; .endif
: =a (*eax),
- =b (*ebx),
+#if defined(__i386__)  defined(__PIC__)
+ =r (*ebx),  /* gcc won't let us use ebx */
+#else
+ =b (*ebx),  /* ebx is okay */
+#endif
  =c (*ecx),
  =d (*edx)
: 0 (*eax), 2 (*ecx)
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v6 6/7] x86,random,kvm: Use KVM_GET_RNG_SEED in arch_rng_init

2014-08-13 Thread Andy Lutomirski

This is a straightforward implementation: for each bit of internal
RNG state, request one bit from KVM_GET_RNG_SEED.  This is done even
if RDSEED/RDRAND worked, since KVM_GET_RNG_SEED is likely to provide
cryptographically secure output even if the CPU's RNG is weak or
compromised.

Acked-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/Kconfig |  4 
 arch/x86/include/asm/kvm_guest.h |  9 +
 arch/x86/kernel/archrandom.c | 25 -
 arch/x86/kernel/kvm.c| 10 ++
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d24887b..ad87278 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -594,6 +594,7 @@ config KVM_GUEST
bool KVM Guest support (including kvmclock)
depends on PARAVIRT
select PARAVIRT_CLOCK
+   select ARCH_RANDOM
default y
---help---
  This option enables various optimizations for running under the KVM
@@ -1508,6 +1509,9 @@ config ARCH_RANDOM
  If supported, this is a high bandwidth, cryptographically
  secure hardware random number generator.
 
+ This also enables paravirt RNGs such as KVM's if the relevant
+ PV guest support is enabled.
+
 config X86_SMAP
def_bool y
prompt Supervisor Mode Access Prevention if EXPERT
diff --git a/arch/x86/include/asm/kvm_guest.h b/arch/x86/include/asm/kvm_guest.h
index a92b176..8c4dbd5 100644
--- a/arch/x86/include/asm/kvm_guest.h
+++ b/arch/x86/include/asm/kvm_guest.h
@@ -3,4 +3,13 @@
 
 int kvm_setup_vsyscall_timeinfo(void);
 
+#if defined(CONFIG_KVM_GUEST)  defined(CONFIG_ARCH_RANDOM)
+extern bool kvm_get_rng_seed(u64 *rv);
+#else
+static inline bool kvm_get_rng_seed(u64 *rv)
+{
+   return false;
+}
+#endif
+
 #endif /* _ASM_X86_KVM_GUEST_H */
diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c
index e8d2ffb..adbaa25 100644
--- a/arch/x86/kernel/archrandom.c
+++ b/arch/x86/kernel/archrandom.c
@@ -15,6 +15,7 @@
  */
 
 #include asm/archrandom.h
+#include asm/kvm_guest.h
 
 void arch_rng_init(void *ctx,
   void (*seed)(void *ctx, u32 data),
@@ -22,7 +23,7 @@ void arch_rng_init(void *ctx,
   const char *log_prefix)
 {
int i;
-   int rdseed_bits = 0, rdrand_bits = 0;
+   int rdseed_bits = 0, rdrand_bits = 0, kvm_bits = 0;
char buf[128] = ;
char *msgptr = buf;
 
@@ -42,10 +43,32 @@ void arch_rng_init(void *ctx,
 #endif
}
 
+   /*
+* Use KVM_GET_RNG_SEED regardless of whether the CPU RNG
+* worked, since it incorporates entropy unavailable to the CPU,
+* and we shouldn't trust the hardware RNG more than we need to.
+* We request enough bits for the entire internal RNG state,
+* because there's no good reason not to.
+*/
+   for (i = 0; i  bits_per_source; i += 64) {
+   u64 rv;
+
+   if (kvm_get_rng_seed(rv)) {
+   seed(ctx, (u32)rv);
+   seed(ctx, (u32)(rv  32));
+   kvm_bits += 8 * sizeof(rv);
+   } else {
+   break;  /* If it fails once, it will keep failing. */
+   }
+   }
+
if (rdseed_bits)
msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits);
if (rdrand_bits)
msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits);
+   if (kvm_bits)
+   msgptr += sprintf(msgptr, , %d bits from KVM_GET_RNG_BITS,
+ kvm_bits);
if (buf[0])
pr_info(%s with %s\n, log_prefix, buf + 2);
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3dd8e2c..bd8783a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -416,6 +416,16 @@ void kvm_disable_steal_time(void)
wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
 }
 
+bool kvm_get_rng_seed(u64 *v)
+{
+   /*
+* Allow migration from a hypervisor with the GET_RNG_SEED
+* feature to a hypervisor without it.
+*/
+   return (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED) 
+   rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0);
+}
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v6 6/7] x86,random,kvm: Use KVM_GET_RNG_SEED in arch_get_rng_seed

2014-08-13 Thread Andy Lutomirski

This is a straightforward implementation: for each bit of internal
RNG state, request one bit from KVM_GET_RNG_SEED.  This is done even
if RDSEED/RDRAND worked, since KVM_GET_RNG_SEED is likely to provide
cryptographically secure output even if the CPU's RNG is weak or
compromised.

Acked-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/Kconfig |  4 
 arch/x86/include/asm/kvm_guest.h |  9 +
 arch/x86/kernel/archrandom.c | 25 -
 arch/x86/kernel/kvm.c| 10 ++
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d24887b..ad87278 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -594,6 +594,7 @@ config KVM_GUEST
bool KVM Guest support (including kvmclock)
depends on PARAVIRT
select PARAVIRT_CLOCK
+   select ARCH_RANDOM
default y
---help---
  This option enables various optimizations for running under the KVM
@@ -1508,6 +1509,9 @@ config ARCH_RANDOM
  If supported, this is a high bandwidth, cryptographically
  secure hardware random number generator.
 
+ This also enables paravirt RNGs such as KVM's if the relevant
+ PV guest support is enabled.
+
 config X86_SMAP
def_bool y
prompt Supervisor Mode Access Prevention if EXPERT
diff --git a/arch/x86/include/asm/kvm_guest.h b/arch/x86/include/asm/kvm_guest.h
index a92b176..8c4dbd5 100644
--- a/arch/x86/include/asm/kvm_guest.h
+++ b/arch/x86/include/asm/kvm_guest.h
@@ -3,4 +3,13 @@
 
 int kvm_setup_vsyscall_timeinfo(void);
 
+#if defined(CONFIG_KVM_GUEST)  defined(CONFIG_ARCH_RANDOM)
+extern bool kvm_get_rng_seed(u64 *rv);
+#else
+static inline bool kvm_get_rng_seed(u64 *rv)
+{
+   return false;
+}
+#endif
+
 #endif /* _ASM_X86_KVM_GUEST_H */
diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c
index e8d2ffb..adbaa25 100644
--- a/arch/x86/kernel/archrandom.c
+++ b/arch/x86/kernel/archrandom.c
@@ -15,6 +15,7 @@
  */
 
 #include asm/archrandom.h
+#include asm/kvm_guest.h
 
 void arch_rng_init(void *ctx,
   void (*seed)(void *ctx, u32 data),
@@ -22,7 +23,7 @@ void arch_rng_init(void *ctx,
   const char *log_prefix)
 {
int i;
-   int rdseed_bits = 0, rdrand_bits = 0;
+   int rdseed_bits = 0, rdrand_bits = 0, kvm_bits = 0;
char buf[128] = ;
char *msgptr = buf;
 
@@ -42,10 +43,32 @@ void arch_rng_init(void *ctx,
 #endif
}
 
+   /*
+* Use KVM_GET_RNG_SEED regardless of whether the CPU RNG
+* worked, since it incorporates entropy unavailable to the CPU,
+* and we shouldn't trust the hardware RNG more than we need to.
+* We request enough bits for the entire internal RNG state,
+* because there's no good reason not to.
+*/
+   for (i = 0; i  bits_per_source; i += 64) {
+   u64 rv;
+
+   if (kvm_get_rng_seed(rv)) {
+   seed(ctx, (u32)rv);
+   seed(ctx, (u32)(rv  32));
+   kvm_bits += 8 * sizeof(rv);
+   } else {
+   break;  /* If it fails once, it will keep failing. */
+   }
+   }
+
if (rdseed_bits)
msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits);
if (rdrand_bits)
msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits);
+   if (kvm_bits)
+   msgptr += sprintf(msgptr, , %d bits from KVM_GET_RNG_BITS,
+ kvm_bits);
if (buf[0])
pr_info(%s with %s\n, log_prefix, buf + 2);
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3dd8e2c..bd8783a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -416,6 +416,16 @@ void kvm_disable_steal_time(void)
wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
 }
 
+bool kvm_get_rng_seed(u64 *v)
+{
+   /*
+* Allow migration from a hypervisor with the GET_RNG_SEED
+* feature to a hypervisor without it.
+*/
+   return (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED) 
+   rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0);
+}
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v6 5/7] x86,random: Add an x86 implementation of arch_rng_init

2014-08-13 Thread Andy Lutomirski

This does the same thing as the generic implementation, except
that it logs how many bits of each type it collected.  I want to
know whether the initial seeding is working and, if so, whether
the RNG is fast enough.

(I know that hpa assures me that the hardware RNG is more than
 fast enough, but I'd still like a direct way to verify this.)

Arguably, arch_get_random_seed could be removed now: I'm having some
trouble imagining a sensible non-architecture-specific use of it
that wouldn't be better served by arch_rng_init.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/include/asm/archrandom.h |  6 +
 arch/x86/kernel/Makefile  |  2 ++
 arch/x86/kernel/archrandom.c  | 51 +++
 3 files changed, 59 insertions(+)
 create mode 100644 arch/x86/kernel/archrandom.c

diff --git a/arch/x86/include/asm/archrandom.h 
b/arch/x86/include/asm/archrandom.h
index 69f1366..5611c21 100644
--- a/arch/x86/include/asm/archrandom.h
+++ b/arch/x86/include/asm/archrandom.h
@@ -117,6 +117,12 @@ GET_SEED(arch_get_random_seed_int, unsigned int, 
RDSEED_INT, ASM_NOP4);
 #define arch_has_random()  static_cpu_has(X86_FEATURE_RDRAND)
 #define arch_has_random_seed() static_cpu_has(X86_FEATURE_RDSEED)
 
+#define __HAVE_ARCH_RNG_INIT
+extern void arch_rng_init(void *ctx,
+ void (*seed)(void *ctx, u32 data),
+ int bits_per_source,
+ const char *log_prefix);
+
 #else
 
 static inline int rdrand_long(unsigned long *v)
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 047f9ff..0718bae 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -92,6 +92,8 @@ obj-$(CONFIG_PARAVIRT)+= paravirt.o 
paravirt_patch_$(BITS).o
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
 obj-$(CONFIG_PARAVIRT_CLOCK)   += pvclock.o
 
+obj-$(CONFIG_ARCH_RANDOM)  += archrandom.o
+
 obj-$(CONFIG_PCSPKR_PLATFORM)  += pcspeaker.o
 
 obj-$(CONFIG_X86_CHECK_BIOS_CORRUPTION) += check.o
diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c
new file mode 100644
index 000..e8d2ffb
--- /dev/null
+++ b/arch/x86/kernel/archrandom.c
@@ -0,0 +1,51 @@
+/*
+ * This file is part of the Linux kernel.
+ *
+ * Copyright (c) 2014 Andy Lutomirski
+ * Authors: Andy Lutomirski l...@amacapital.net
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include asm/archrandom.h
+
+void arch_rng_init(void *ctx,
+  void (*seed)(void *ctx, u32 data),
+  int bits_per_source,
+  const char *log_prefix)
+{
+   int i;
+   int rdseed_bits = 0, rdrand_bits = 0;
+   char buf[128] = ;
+   char *msgptr = buf;
+
+   for (i = 0; i  bits_per_source; i += 8 * sizeof(long)) {
+   unsigned long rv;
+
+   if (arch_get_random_seed_long(rv))
+   rdseed_bits += 8 * sizeof(rv);
+   else if (arch_get_random_long(rv))
+   rdrand_bits += 8 * sizeof(rv);
+   else
+   continue;   /* Don't waste time mixing. */
+
+   seed(ctx, (u32)rv);
+#if BITS_PER_LONG  32
+   seed(ctx, (u32)(rv  32));
+#endif
+   }
+
+   if (rdseed_bits)
+   msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits);
+   if (rdrand_bits)
+   msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits);
+   if (buf[0])
+   pr_info(%s with %s\n, log_prefix, buf + 2);
+}
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v6 3/7] random: Reseed pools on resume

2014-08-13 Thread Andy Lutomirski

After a suspend/resume cycle, and especially after hibernating, we
should assume that the random pools might have leaked.  To minimize
the risk this poses, try to collect fresh architectural entropy on
resume.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 drivers/char/random.c | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 8dc3e3a..0811ad4 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -257,6 +257,7 @@
 #include linux/kmemcheck.h
 #include linux/workqueue.h
 #include linux/irq.h
+#include linux/syscore_ops.h
 
 #include asm/processor.h
 #include asm/uaccess.h
@@ -1279,6 +1280,26 @@ static void init_std_data(struct entropy_store *r)
mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
 }
 
+static void init_all_pools(void)
+{
+   init_std_data(input_pool);
+   init_std_data(blocking_pool);
+   init_std_data(nonblocking_pool);
+}
+
+static void random_resume(void)
+{
+   /*
+* After resume (and especially after hibernation / kexec resume),
+* make a best-effort attempt to collect fresh entropy.
+*/
+   init_all_pools();
+}
+
+static struct syscore_ops random_syscore_ops = {
+   .resume = random_resume,
+};
+
 /*
  * Note that setup_arch() may call add_device_randomness()
  * long before we get here. This allows seeding of the pools
@@ -1291,9 +1312,8 @@ static void init_std_data(struct entropy_store *r)
  */
 static int rand_initialize(void)
 {
-   init_std_data(input_pool);
-   init_std_data(blocking_pool);
-   init_std_data(nonblocking_pool);
+   init_all_pools();
+   register_syscore_ops(random_syscore_ops);
return 0;
 }
 early_initcall(rand_initialize);
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v6 4/7] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit

2014-08-13 Thread Andy Lutomirski

This adds a simple interface to allow a guest to request 64 bits of
host nonblocking entropy.  This is independent of virtio-rng for a
couple of reasons:

 - It's intended to be usable during early boot, when a trivial
   synchronous interface is needed.

 - virtio-rng gives blocking entropy, and making guest boot wait for
   the host's /dev/random will cause problems.

MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort
cryptographically secure data for use as a seed.  It provides no
guarantee that the result contains any actual entropy.

Acked-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 Documentation/virtual/kvm/cpuid.txt  | 3 +++
 arch/x86/include/uapi/asm/kvm_para.h | 2 ++
 arch/x86/kvm/cpuid.c | 3 ++-
 arch/x86/kvm/x86.c   | 4 
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..0ab043b 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest checks 
this feature bit
||   || before enabling paravirtualized
||   || spinlock support.
 --
+KVM_FEATURE_GET_RNG_SEED   || 8 || host provides rng seed data via
+   ||   || MSR_KVM_GET_RNG_SEED.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 94dc8ca..e2eaf93 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -24,6 +24,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_GET_RNG_SEED   8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -40,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
+#define MSR_KVM_GET_RNG_SEED 0x4b564d05
 
 struct kvm_steal_time {
__u64 steal;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 38a0afe..40d6763 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_PV_EOI) |
 (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
-(1  KVM_FEATURE_PV_UNHALT);
+(1  KVM_FEATURE_PV_UNHALT) |
+(1  KVM_FEATURE_GET_RNG_SEED);
 
if (sched_info_on())
entry-eax |= (1  KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ef432f8..695b682 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -48,6 +48,7 @@
 #include linux/pci.h
 #include linux/timekeeper_internal.h
 #include linux/pvclock_gtod.h
+#include linux/random.h
 #include trace/events/kvm.h
 
 #define CREATE_TRACE_POINTS
@@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
case MSR_KVM_PV_EOI_EN:
data = vcpu-arch.pv_eoi.msr_val;
break;
+   case MSR_KVM_GET_RNG_SEED:
+   get_random_bytes(data, sizeof(data));
+   break;
case MSR_IA32_P5_MC_ADDR:
case MSR_IA32_P5_MC_TYPE:
case MSR_IA32_MCG_CAP:
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v6 2/7] random, timekeeping: Collect timekeeping entropy in the timekeeping code

2014-08-13 Thread Andy Lutomirski

Currently, init_std_data calls ktime_get_real().  This imposes
awkward constraints on when init_std_data can be called, and
init_std_data is unlikely to collect the full unpredictable data
available to the timekeeping code, especially after resume.

Remove this code from random.c and add the appropriate
add_device_randomness calls to timekeeping.c instead.

Cc: John Stultz john.stu...@linaro.org
Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 drivers/char/random.c |  2 --
 kernel/time/timekeeping.c | 11 +++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 7673e60..8dc3e3a 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1263,12 +1263,10 @@ static void seed_entropy_store(void *ctx, u32 data)
 static void init_std_data(struct entropy_store *r)
 {
int i;
-   ktime_t now = ktime_get_real();
unsigned long rv;
char log_prefix[128];
 
r-last_pulled = jiffies;
-   mix_pool_bytes(r, now, sizeof(now), NULL);
for (i = r-poolinfo-poolbytes; i  0; i -= sizeof(rv)) {
rv = random_get_entropy();
mix_pool_bytes(r, rv, sizeof(rv), NULL);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 32d8d6a..9609db9 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -23,6 +23,7 @@
 #include linux/stop_machine.h
 #include linux/pvclock_gtod.h
 #include linux/compiler.h
+#include linux/random.h
 
 #include tick-internal.h
 #include ntp_internal.h
@@ -835,6 +836,9 @@ void __init timekeeping_init(void)
memcpy(shadow_timekeeper, timekeeper, sizeof(timekeeper));
 
write_seqcount_end(timekeeper_seq);
+
+   add_device_randomness(tk, sizeof(tk));
+
raw_spin_unlock_irqrestore(timekeeper_lock, flags);
 }
 
@@ -976,6 +980,13 @@ static void timekeeping_resume(void)
timekeeping_suspended = 0;
timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET);
write_seqcount_end(timekeeper_seq);
+
+   /*
+* The timekeeping state has a decent chance of differing
+* between resumptions of the same image.
+*/
+   add_device_randomness(tk, sizeof(tk));
+
raw_spin_unlock_irqrestore(timekeeper_lock, flags);
 
touch_softlockup_watchdog();
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v6 1/7] random: Add and use arch_rng_init

2014-08-13 Thread Andy Lutomirski

Currently, init_std_data contains its own logic for using arch
random sources.  This replaces that logic with a generic function
arch_rng_init that allows arch code to supply its own logic.  The
default implementation tries arch_get_random_seed_long and
arch_get_random_long individually.

The only functional change here is that random_get_entropy() is used
unconditionally instead of being used only when the arch sources
fail.  This may add a tiny amount of security.

Acked-by: Theodore Ts'o ty...@mit.edu
Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 drivers/char/random.c  | 14 +++---
 include/linux/random.h | 40 
 2 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 71529e1..7673e60 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1246,6 +1246,10 @@ void get_random_bytes_arch(void *buf, int nbytes)
 }
 EXPORT_SYMBOL(get_random_bytes_arch);
 
+static void seed_entropy_store(void *ctx, u32 data)
+{
+   mix_pool_bytes((struct entropy_store *)ctx, data, sizeof(data), NULL);
+}
 
 /*
  * init_std_data - initialize pool with system data
@@ -1261,15 +1265,19 @@ static void init_std_data(struct entropy_store *r)
int i;
ktime_t now = ktime_get_real();
unsigned long rv;
+   char log_prefix[128];
 
r-last_pulled = jiffies;
mix_pool_bytes(r, now, sizeof(now), NULL);
for (i = r-poolinfo-poolbytes; i  0; i -= sizeof(rv)) {
-   if (!arch_get_random_seed_long(rv) 
-   !arch_get_random_long(rv))
-   rv = random_get_entropy();
+   rv = random_get_entropy();
mix_pool_bytes(r, rv, sizeof(rv), NULL);
}
+
+   sprintf(log_prefix, random: seeded %s pool, r-name);
+   arch_rng_init(r, seed_entropy_store, 8 * r-poolinfo-poolbytes,
+ log_prefix);
+
mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
 }
 
diff --git a/include/linux/random.h b/include/linux/random.h
index 57fbbff..c8d692e 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -106,6 +106,46 @@ static inline int arch_has_random_seed(void)
 }
 #endif
 
+#ifndef __HAVE_ARCH_RNG_INIT
+
+/**
+ * arch_rng_init() - get architectural rng seed data
+ * @ctx: context for the seed function
+ * @seed: function to call for each u32 obtained
+ * @bits_per_source: number of bits from each source to try to use
+ * @log_prefix: beginning of log output (may be NULL)
+ *
+ * Synchronously load some architectural entropy or other best-effort
+ * random seed data.  An arch-specific implementation should be no worse
+ * than this generic implementation.  If the arch code does something
+ * interesting, it may log something of the form log_prefix with
+ * 8 bits of stuff.
+ *
+ * No arch-specific implementation should be any worse than the generic
+ * implementation.
+ */
+static inline void arch_rng_init(void *ctx,
+void (*seed)(void *ctx, u32 data),
+int bits_per_source,
+const char *log_prefix)
+{
+   int i;
+
+   for (i = 0; i  bits_per_source; i += 8 * sizeof(long)) {
+   unsigned long rv;
+
+   if (arch_get_random_seed_long(rv) ||
+   arch_get_random_long(rv)) {
+   seed(ctx, (u32)rv);
+#if BITS_PER_LONG  32
+   seed(ctx, (u32)(rv  32));
+#endif
+   }
+   }
+}
+
+#endif /* __HAVE_ARCH_RNG_INIT */
+
 /* Pseudo random number generator from numerical recipes. */
 static inline u32 next_pseudo_random32(u32 seed)
 {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-08-13 Thread Bharat Bhushan

This patch emulates debug registers and debug exception
to support guest using debug resource. This enables running
gdb/kgdb etc in guest.

On BOOKE architecture we cannot share debug resources between QEMU and
guest because:
When QEMU is using debug resources then debug exception must
be always enabled. To achieve this we set MSR_DE and also set
MSRP_DEP so guest cannot change MSR_DE.

When emulating debug resource for guest we want guest
to control MSR_DE (enable/disable debug interrupt on need).

So above mentioned two configuration cannot be supported
at the same time. So the result is that we cannot share
debug resources between QEMU and Guest on BOOKE architecture.

In the current design QEMU gets priority over guest, this means that if
QEMU is using debug resources then guest cannot use them and if guest is
using debug resource then QEMU can overwrite them.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
v3-v4
 - Clear only MRR on vcpu init

 arch/powerpc/include/asm/kvm_ppc.h   |   3 +
 arch/powerpc/include/asm/reg_booke.h |   2 +
 arch/powerpc/kvm/booke.c |  42 +-
 arch/powerpc/kvm/booke_emulate.c | 148 +++
 4 files changed, 194 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index fb86a22..05e58b6 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -206,6 +206,9 @@ extern int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, 
u32 *server,
 extern int kvmppc_xics_int_on(struct kvm *kvm, u32 irq);
 extern int kvmppc_xics_int_off(struct kvm *kvm, u32 irq);
 
+void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu);
+void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu);
+
 union kvmppc_one_reg {
u32 wval;
u64 dval;
diff --git a/arch/powerpc/include/asm/reg_booke.h 
b/arch/powerpc/include/asm/reg_booke.h
index 464f108..150d485 100644
--- a/arch/powerpc/include/asm/reg_booke.h
+++ b/arch/powerpc/include/asm/reg_booke.h
@@ -307,6 +307,8 @@
  * DBSR bits which have conflicting definitions on true Book E versus IBM 40x.
  */
 #ifdef CONFIG_BOOKE
+#define DBSR_IDE   0x8000  /* Imprecise Debug Event */
+#define DBSR_MRR   0x3000  /* Most Recent Reset */
 #define DBSR_IC0x0800  /* Instruction Completion */
 #define DBSR_BT0x0400  /* Branch Taken */
 #define DBSR_IRPT  0x0200  /* Exception Debug Event */
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 074b7fc..6901862 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -267,6 +267,16 @@ static void kvmppc_core_dequeue_watchdog(struct kvm_vcpu 
*vcpu)
clear_bit(BOOKE_IRQPRIO_WATCHDOG, vcpu-arch.pending_exceptions);
 }
 
+void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu)
+{
+   kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DEBUG);
+}
+
+void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu)
+{
+   clear_bit(BOOKE_IRQPRIO_DEBUG, vcpu-arch.pending_exceptions);
+}
+
 static void set_guest_srr(struct kvm_vcpu *vcpu, unsigned long srr0, u32 srr1)
 {
kvmppc_set_srr0(vcpu, srr0);
@@ -735,7 +745,32 @@ static int kvmppc_handle_debug(struct kvm_run *run, struct 
kvm_vcpu *vcpu)
struct debug_reg *dbg_reg = (vcpu-arch.dbg_reg);
u32 dbsr = vcpu-arch.dbsr;
 
-   /* Clear guest dbsr (vcpu-arch.dbsr) */
+   if (vcpu-guest_debug == 0) {
+   /*
+* Debug resources belong to Guest.
+* Imprecise debug event is not injected
+*/
+   if (dbsr  DBSR_IDE) {
+   dbsr = ~DBSR_IDE;
+   if (!dbsr)
+   return RESUME_GUEST;
+   }
+
+   if (dbsr  (vcpu-arch.shared-msr  MSR_DE) 
+   (vcpu-arch.dbg_reg.dbcr0  DBCR0_IDM))
+   kvmppc_core_queue_debug(vcpu);
+
+   /* Inject a program interrupt if trap debug is not allowed */
+   if ((dbsr  DBSR_TIE)  !(vcpu-arch.shared-msr  MSR_DE))
+   kvmppc_core_queue_program(vcpu, ESR_PTR);
+
+   return RESUME_GUEST;
+   }
+
+   /*
+* Debug resource owned by userspace.
+* Clear guest dbsr (vcpu-arch.dbsr)
+*/
vcpu-arch.dbsr = 0;
run-debug.arch.status = 0;
run-debug.arch.address = vcpu-arch.pc;
@@ -1249,6 +1284,11 @@ int kvmppc_subarch_vcpu_init(struct kvm_vcpu *vcpu)
setup_timer(vcpu-arch.wdt_timer, kvmppc_watchdog_func,
(unsigned long)vcpu);
 
+   /*
+* Clear DBSR.MRR to avoid guest debug interrupt as
+* this is of host interest
+*/
+   mtspr(SPRN_DBSR, DBSR_MRR);
return 0;
 }
 
diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c
index

[PATCH] KVM: PPc: BOOKE: Add one_reg documentation of SPRG9 and DBSR

2014-08-13 Thread Bharat Bhushan

This was missed in respective one_reg implementation patch.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 Documentation/virtual/kvm/api.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index a21ff22..9177f23 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1878,6 +1878,8 @@ registers, find a list below:
   PPC   | KVM_REG_PPC_ARCH_COMPAT 32
   PPC   | KVM_REG_PPC_DABRX | 32
   PPC   | KVM_REG_PPC_WORT  | 64
+  PPC  | KVM_REG_PPC_SPRG9 | 64
+  PPC  | KVM_REG_PPC_DBSR  | 32
   PPC   | KVM_REG_PPC_TM_GPR0  | 64
   ...
   PPC   | KVM_REG_PPC_TM_GPR31 | 64
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] PC, KVM, CMA: Fix regression caused by wrong get_order() use

2014-08-13 Thread Aneesh Kumar K.V

Alexey Kardashevskiy a...@ozlabs.ru writes:

 fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no
 functional change but this is not true as it calls get_order() (which
 takes bytes) where it should have called ilog2() and the kernel stops
 on VM_BUG_ON().

 This replaces get_order() with ilog2().

Should we round it up ? ie, ilog2(kvm_rma_pages - 1) + 1 ?


 Suggested-by: Paul Mackerras pau...@samba.org
 Cc: Alexander Graf ag...@suse.de
 Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 Cc: Joonsoo Kim iamjoonsoo@lge.com
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: sta...@vger.kernel.org

Why stable ? We merged it this merge window.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
 b/arch/powerpc/kvm/book3s_hv_builtin.c
 index 329d7fd..bfe9f01 100644
 --- a/arch/powerpc/kvm/book3s_hv_builtin.c
 +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
 @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma()
   ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL);
   if (!ri)
   return NULL;
 - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages));
 + page = cma_alloc(kvm_cma, kvm_rma_pages, ilog2(kvm_rma_pages));
   if (!page)
   goto err_out;
   atomic_set(ri-use_count, 1);
 @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
  {
   unsigned long align_pages = HPT_ALIGN_PAGES;

 - VM_BUG_ON(get_order(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
 + VM_BUG_ON(ilog2(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);

   /* Old CPUs require HPT aligned on a multiple of its size */
   if (!cpu_has_feature(CPU_FTR_ARCH_206))
   align_pages = nr_pages;
 - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
 + return cma_alloc(kvm_cma, nr_pages, ilog2(align_pages));
  }
  EXPORT_SYMBOL_GPL(kvm_alloc_hpt);

 -- 
 2.0.0

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] PC, KVM, CMA: Fix regression caused by wrong get_order() use

2014-08-13 Thread Alexey Kardashevskiy

fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no
functional change but this is not true as it calls get_order() (which
takes bytes) where it should have called ilog2() and the kernel stops
on VM_BUG_ON().

This replaces get_order() with order_base_2() (round-up version of ilog2).

Suggested-by: Paul Mackerras pau...@samba.org
Cc: Alexander Graf ag...@suse.de
Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Cc: Joonsoo Kim iamjoonsoo@lge.com
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---

Changes:
v2:
* s/ilog2/order_base_2/
* removed cc: sta...@vger.kernel.org as I got wrong impression that v3.16 is
broken

---
 arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 329d7fd..b9615ba 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma()
ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL);
if (!ri)
return NULL;
-   page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages));
+   page = cma_alloc(kvm_cma, kvm_rma_pages, order_base_2(kvm_rma_pages));
if (!page)
goto err_out;
atomic_set(ri-use_count, 1);
@@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
 {
unsigned long align_pages = HPT_ALIGN_PAGES;
 
-   VM_BUG_ON(get_order(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
+   VM_BUG_ON(order_base_2(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
 
/* Old CPUs require HPT aligned on a multiple of its size */
if (!cpu_has_feature(CPU_FTR_ARCH_206))
align_pages = nr_pages;
-   return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
+   return cma_alloc(kvm_cma, nr_pages, order_base_2(align_pages));
 }
 EXPORT_SYMBOL_GPL(kvm_alloc_hpt);
 
-- 
2.0.0

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] PC, KVM, CMA: Fix regression caused by wrong get_order() use

2014-08-13 Thread Aneesh Kumar K.V

Alexey Kardashevskiy a...@ozlabs.ru writes:

 fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no
 functional change but this is not true as it calls get_order() (which
 takes bytes) where it should have called ilog2() and the kernel stops
 on VM_BUG_ON().

 This replaces get_order() with order_base_2() (round-up version of ilog2).

 Suggested-by: Paul Mackerras pau...@samba.org
 Cc: Alexander Graf ag...@suse.de
 Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 Cc: Joonsoo Kim iamjoonsoo@lge.com
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

 ---

 Changes:
 v2:
 * s/ilog2/order_base_2/
 * removed cc: sta...@vger.kernel.org as I got wrong impression that v3.16 is
 broken

 ---
  arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
 b/arch/powerpc/kvm/book3s_hv_builtin.c
 index 329d7fd..b9615ba 100644
 --- a/arch/powerpc/kvm/book3s_hv_builtin.c
 +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
 @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma()
   ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL);
   if (!ri)
   return NULL;
 - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages));
 + page = cma_alloc(kvm_cma, kvm_rma_pages, order_base_2(kvm_rma_pages));
   if (!page)
   goto err_out;
   atomic_set(ri-use_count, 1);
 @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
  {
   unsigned long align_pages = HPT_ALIGN_PAGES;

 - VM_BUG_ON(get_order(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
 + VM_BUG_ON(order_base_2(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);

   /* Old CPUs require HPT aligned on a multiple of its size */
   if (!cpu_has_feature(CPU_FTR_ARCH_206))
   align_pages = nr_pages;
 - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
 + return cma_alloc(kvm_cma, nr_pages, order_base_2(align_pages));
  }
  EXPORT_SYMBOL_GPL(kvm_alloc_hpt);

 -- 
 2.0.0

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

64 matches

Mail list logo