Re: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
On Tue, Aug 12, 2014 at 06:27:11PM -0700, Mario Smarduch wrote: On 08/12/2014 02:50 AM, Christoffer Dall wrote: On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote: On 08/11/2014 12:13 PM, Christoffer Dall wrote: On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote: [...] @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data) { pte_t *pte = (pte_t *)data; -stage2_set_pte(kvm, NULL, gpa, pte, false); +stage2_set_pte(kvm, NULL, gpa, pte, false, false); why is logging never active if we are called from MMU notifiers? mmu notifiers update sptes, but I don't see how these updates can result in guest dirty pages. Also guest pages are marked dirty from 2nd stage page fault handlers (searching through the code). Ok, then add: /* * We can always call stage2_set_pte with logging_active == false, * because MMU notifiers will have unmapped a huge PMD before calling * -change_pte() (which in turn calls kvm_set_spte_hva()) and therefore * stage2_set_pte() never needs to clear out a huge PMD through this * calling path. */ So here on permission change to primary pte's kernel first invalidates related s2ptes followed by -change_pte calls to synchronize s2ptes. As consequence of invalidation we unmap huge PMDs, if a page falls in that range. Is the comment to point out use of logging flag under various scenarios? The comment is because when you look at this function it is not obvious why we pass logging_active=false, despite logging may actually be active. This could suggest that the parameter to stage2_set_pte() should be named differently (break_huge_pmds) or something like that, but we can also be satisfied with the comment. Should I add comments on flag use in other places as well? It's always a judgement call. I didn't find it necessarry to put a comment elsewhere because I think it's pretty obivous that we would never care about logging writes to device regions. However, this made me think, are we making sure that we are not marking device mappings as read-only in the wp_range functions? I think it's quite bad if we mark the VCPU interface as read-only for example. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On 08/12/2014 12:22 PM, Andy Lutomirski wrote: On Tue, Aug 12, 2014 at 12:17 PM, Theodore Ts'o ty...@mit.edu wrote: On Tue, Aug 12, 2014 at 12:11:29PM -0700, Andy Lutomirski wrote: What's the status of this series? I assume that it's too late for at least patches 2-5 to make it into 3.17. Which tree were you hoping this patch series to go through? I was assuming it would go through the x86 tree since the bulk of the changes in the x86 subsystem (hence my Acked-by). There's some argument that patch 1 should go through the kvm tree. There's no real need for patch 1 and 2-5 to end up in the same kernel release, either. IIRC, Peter had some concerns, and I don't remember if they were all addressed. Peter? I don't know. I rewrite one thing he didn't like and undid the other, but there's plenty of opportunity for this version to be problematic, too. Sorry, I have been heads down on the current merge window. I will look at this for 3.18, presumably after Kernel Summit. The proposed arch_get_rng_seed() is not really what it claims to be; it most definitely does not produce seed-grade randomness, instead it seems to be an arch function for best-effort initialization of the entropy pools -- which is fine, it is just something quite different. I want to look over it more carefully before acking it, though. Andy, are you going to be in Chicago? -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On Aug 13, 2014 12:48 AM, H. Peter Anvin h...@zytor.com wrote: On 08/12/2014 12:22 PM, Andy Lutomirski wrote: On Tue, Aug 12, 2014 at 12:17 PM, Theodore Ts'o ty...@mit.edu wrote: On Tue, Aug 12, 2014 at 12:11:29PM -0700, Andy Lutomirski wrote: What's the status of this series? I assume that it's too late for at least patches 2-5 to make it into 3.17. Which tree were you hoping this patch series to go through? I was assuming it would go through the x86 tree since the bulk of the changes in the x86 subsystem (hence my Acked-by). There's some argument that patch 1 should go through the kvm tree. There's no real need for patch 1 and 2-5 to end up in the same kernel release, either. IIRC, Peter had some concerns, and I don't remember if they were all addressed. Peter? I don't know. I rewrite one thing he didn't like and undid the other, but there's plenty of opportunity for this version to be problematic, too. Sorry, I have been heads down on the current merge window. I will look at this for 3.18, presumably after Kernel Summit. The proposed arch_get_rng_seed() is not really what it claims to be; it most definitely does not produce seed-grade randomness, instead it seems to be an arch function for best-effort initialization of the entropy pools -- which is fine, it is just something quite different. Fair enough. I meant seed as in something that initialized a PRNG (think srand), not seed as in a promised-to-be-cryptographically-secure seed for a DRBG. I can rename it, update the comment, or otherwise tweak it to make the intent clearer. I want to look over it more carefully before acking it, though. It would also be nice for someone with a Haswell box (and an RDSEED box) to test it. I have neither. Andy, are you going to be in Chicago? Yes. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The status about vhost-net on kvm-arm?
On Tue, Aug 12, 2014 at 6:47 PM, Nikolay Nikolaev n.nikol...@virtualopensystems.com wrote: Hello, On Tue, Aug 12, 2014 at 5:41 AM, Li Liu john.li...@huawei.com wrote: Hi all, Is anyone there can tell the current status of vhost-net on kvm-arm? Half a year has passed from Isa Ansharullah asked this question: http://www.spinics.net/lists/kvm-arm/msg08152.html I have found two patches which have provided the kvm-arm support of eventfd and irqfd: 1) [RFC PATCH 0/4] ARM: KVM: Enable the ioeventfd capability of KVM on ARM http://lists.gnu.org/archive/html/qemu-devel/2014-01/msg01770.html 2) [RFC,v3] ARM: KVM: add irqfd and irq routing support https://patches.linaro.org/32261/ And there's a rough patch for qemu to support eventfd from Ying-Shiuan Pan: [Qemu-devel] [PATCH 0/4] ioeventfd support for virtio-mmio https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html But there no any comments of this patch. And I can found nothing about qemu to support irqfd. Do I lost the track? If nobody try to fix it. We have a plan to complete it about virtio-mmio supporing irqfd and multiqueue. we at Virtual Open Systems did some work and tested vhost-net on ARM back in March. The setup was based on: - host kernel with our ioeventfd patches: http://www.spinics.net/lists/kvm-arm/msg08413.html - qemu with the aforementioned patches from Ying-Shiuan Pan https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html The testbed was ARM Chromebook with Exynos 5250, using a 1Gbps USB3 Ethernet adapter connected to a 1Gbps switch. I can't find the actual numbers but I remember that with multiple streams the gain was clearly seen. Note that it used the minimum required ioventfd implementation and not irqfd. I guess it is feasible to think that it all can be put together and rebased + the recent irqfd work. One can achiev even better performance (because of the irqfd). Managed to replicate the setup with the old versions e used in March: Single stream from another machine to chromebook with 1Gbps USB3 Ethernet adapter. iperf -c address -P 1 -i 1 -p 5001 -f k -t 10 to HOST: 858316 Kbits/sec to GUEST: 761563 Kbits/sec 10 parallel streams iperf -c address -P 10 -i 1 -p 5001 -f k -t 10 to HOST: 842420 Kbits/sec to GUEST: 625144 Kbits/sec ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm regards, Nikolay Nikolaev Virtual Open Systems -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] KVM: PPC: BOOKE: Emulate debug registers and exception
This patch emulates debug registers and debug exception to support guest using debug resource. This enables running gdb/kgdb etc in guest. On BOOKE architecture we cannot share debug resources between QEMU and guest because: When QEMU is using debug resources then debug exception must be always enabled. To achieve this we set MSR_DE and also set MSRP_DEP so guest cannot change MSR_DE. When emulating debug resource for guest we want guest to control MSR_DE (enable/disable debug interrupt on need). So above mentioned two configuration cannot be supported at the same time. So the result is that we cannot share debug resources between QEMU and Guest on BOOKE architecture. In the current design QEMU gets priority over guest, this means that if QEMU is using debug resources then guest cannot use them and if guest is using debug resource then QEMU can overwrite them. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v3-v4 - Clear only MRR on vcpu init arch/powerpc/include/asm/kvm_ppc.h | 3 + arch/powerpc/include/asm/reg_booke.h | 2 + arch/powerpc/kvm/booke.c | 42 +- arch/powerpc/kvm/booke_emulate.c | 148 +++ 4 files changed, 194 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index fb86a22..05e58b6 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -206,6 +206,9 @@ extern int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, extern int kvmppc_xics_int_on(struct kvm *kvm, u32 irq); extern int kvmppc_xics_int_off(struct kvm *kvm, u32 irq); +void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu); +void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu); + union kvmppc_one_reg { u32 wval; u64 dval; diff --git a/arch/powerpc/include/asm/reg_booke.h b/arch/powerpc/include/asm/reg_booke.h index 464f108..150d485 100644 --- a/arch/powerpc/include/asm/reg_booke.h +++ b/arch/powerpc/include/asm/reg_booke.h @@ -307,6 +307,8 @@ * DBSR bits which have conflicting definitions on true Book E versus IBM 40x. */ #ifdef CONFIG_BOOKE +#define DBSR_IDE 0x8000 /* Imprecise Debug Event */ +#define DBSR_MRR 0x3000 /* Most Recent Reset */ #define DBSR_IC0x0800 /* Instruction Completion */ #define DBSR_BT0x0400 /* Branch Taken */ #define DBSR_IRPT 0x0200 /* Exception Debug Event */ diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 074b7fc..6901862 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -267,6 +267,16 @@ static void kvmppc_core_dequeue_watchdog(struct kvm_vcpu *vcpu) clear_bit(BOOKE_IRQPRIO_WATCHDOG, vcpu-arch.pending_exceptions); } +void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu) +{ + kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DEBUG); +} + +void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu) +{ + clear_bit(BOOKE_IRQPRIO_DEBUG, vcpu-arch.pending_exceptions); +} + static void set_guest_srr(struct kvm_vcpu *vcpu, unsigned long srr0, u32 srr1) { kvmppc_set_srr0(vcpu, srr0); @@ -735,7 +745,32 @@ static int kvmppc_handle_debug(struct kvm_run *run, struct kvm_vcpu *vcpu) struct debug_reg *dbg_reg = (vcpu-arch.dbg_reg); u32 dbsr = vcpu-arch.dbsr; - /* Clear guest dbsr (vcpu-arch.dbsr) */ + if (vcpu-guest_debug == 0) { + /* +* Debug resources belong to Guest. +* Imprecise debug event is not injected +*/ + if (dbsr DBSR_IDE) { + dbsr = ~DBSR_IDE; + if (!dbsr) + return RESUME_GUEST; + } + + if (dbsr (vcpu-arch.shared-msr MSR_DE) + (vcpu-arch.dbg_reg.dbcr0 DBCR0_IDM)) + kvmppc_core_queue_debug(vcpu); + + /* Inject a program interrupt if trap debug is not allowed */ + if ((dbsr DBSR_TIE) !(vcpu-arch.shared-msr MSR_DE)) + kvmppc_core_queue_program(vcpu, ESR_PTR); + + return RESUME_GUEST; + } + + /* +* Debug resource owned by userspace. +* Clear guest dbsr (vcpu-arch.dbsr) +*/ vcpu-arch.dbsr = 0; run-debug.arch.status = 0; run-debug.arch.address = vcpu-arch.pc; @@ -1249,6 +1284,11 @@ int kvmppc_subarch_vcpu_init(struct kvm_vcpu *vcpu) setup_timer(vcpu-arch.wdt_timer, kvmppc_watchdog_func, (unsigned long)vcpu); + /* +* Clear DBSR.MRR to avoid guest debug interrupt as +* this is of host interest +*/ + mtspr(SPRN_DBSR, DBSR_MRR); return 0; } diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c index
[PATCH] KVM: PPc: BOOKE: Add one_reg documentation of SPRG9 and DBSR
This was missed in respective one_reg implementation patch. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- Documentation/virtual/kvm/api.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index a21ff22..9177f23 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1878,6 +1878,8 @@ registers, find a list below: PPC | KVM_REG_PPC_ARCH_COMPAT 32 PPC | KVM_REG_PPC_DABRX | 32 PPC | KVM_REG_PPC_WORT | 64 + PPC | KVM_REG_PPC_SPRG9 | 64 + PPC | KVM_REG_PPC_DBSR | 32 PPC | KVM_REG_PPC_TM_GPR0 | 64 ... PPC | KVM_REG_PPC_TM_GPR31 | 64 -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: x86: check ISR and TMR to construct eoi exit bitmap
From: Yang Zhang yang.z.zh...@intel.com Guest may mask the IOAPIC entry before issue EOI. In such case, EOI will not be intercepted by hypervisor due to the corrensponding bit in eoi exit bitmap is not setting. The solution is to check ISR + TMR to construct the EOI exit bitmap. This patch is a better fixing for the issue that commit 0f6c0a740b tries to solve. Tested-by: Alex Williamson alex.william...@redhat.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com Signed-off-by: Wei Wang wei.w.w...@intel.com --- arch/x86/kvm/lapic.c | 17 + arch/x86/kvm/lapic.h |2 ++ arch/x86/kvm/x86.c |9 + virt/kvm/ioapic.c|7 --- 4 files changed, 32 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 08e8a89..0ed4bcb 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -515,6 +515,23 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu) __clear_bit(KVM_APIC_PV_EOI_PENDING, vcpu-arch.apic_attention); } +void kvm_apic_zap_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, + u32 *tmr) +{ + u32 i, reg_off, intr_in_service; + struct kvm_lapic *apic = vcpu-arch.apic; + + for (i = 0; i 8; i++) { + reg_off = 0x10 * i; + intr_in_service = apic_read_reg(apic, APIC_ISR + reg_off) + kvm_apic_get_reg(apic, APIC_TMR + reg_off); + if (intr_in_service) { + *((u32 *)eoi_exit_bitmap + i) |= intr_in_service; + tmr[i] |= intr_in_service; + } + } +} + void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr) { struct kvm_lapic *apic = vcpu-arch.apic; diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 6a11845..4ee3d70 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -53,6 +53,8 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value); u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu); void kvm_apic_set_version(struct kvm_vcpu *vcpu); +void kvm_apic_zap_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, + u32 *tmr); void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr); void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir); int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 204422d..755b556 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6005,6 +6005,15 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) memset(tmr, 0, 32); kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr); + /* +* Guest may mask the IOAPIC entry before issue EOI. In such case, +* EOI will not be intercepted by hypervisor due to the corrensponding +* bit in eoi exit bitmap is not setting. +* +* The solution is to check ISR + TMR to construct the EOI exit bitmap. +*/ + kvm_apic_zap_eoi_exitmap(vcpu, eoi_exit_bitmap, tmr); + kvm_x86_ops-load_eoi_exitmap(vcpu, eoi_exit_bitmap); kvm_apic_update_tmr(vcpu, tmr); } diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index e8ce34c..2458a1d 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -254,9 +254,10 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, spin_lock(ioapic-lock); for (index = 0; index IOAPIC_NUM_PINS; index++) { e = ioapic-redirtbl[index]; - if (e-fields.trig_mode == IOAPIC_LEVEL_TRIG || - kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, index) || - index == RTC_GSI) { + if (!e-fields.mask + (e-fields.trig_mode == IOAPIC_LEVEL_TRIG || +kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, +index) || index == RTC_GSI)) { if (kvm_apic_match_dest(vcpu, NULL, 0, e-fields.dest_id, e-fields.dest_mode)) { __set_bit(e-fields.vector, -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] VFIO: PLATFORM: Add device tree info API and skeleton
This patch introduced the API to return device tree info about a PLATFORM device (if described by a device tree) and the skeleton of the implementation for VFIO_PLATFORM. Information about any device node bound by VFIO_PLATFORM should be queried via the introduced ioctl VFIO_DEVICE_GET_DEVTREE_INFO. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com --- drivers/vfio/platform/Makefile| 2 +- drivers/vfio/platform/devtree.c | 27 ++ drivers/vfio/platform/vfio_platform.c | 11 + drivers/vfio/platform/vfio_platform_private.h | 7 ++ include/uapi/linux/vfio.h | 32 --- 5 files changed, 75 insertions(+), 4 deletions(-) create mode 100644 drivers/vfio/platform/devtree.c diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile index 2c53327..4313fd7 100644 --- a/drivers/vfio/platform/Makefile +++ b/drivers/vfio/platform/Makefile @@ -1,4 +1,4 @@ -vfio-platform-y := vfio_platform.o vfio_platform_irq.o +vfio-platform-y := vfio_platform.o vfio_platform_irq.o devtree.o obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o diff --git a/drivers/vfio/platform/devtree.c b/drivers/vfio/platform/devtree.c new file mode 100644 index 000..91cab88 --- /dev/null +++ b/drivers/vfio/platform/devtree.c @@ -0,0 +1,27 @@ +#include linux/slab.h +#include linux/vfio.h +#include linux/of.h +#include linux/platform_device.h +#include vfio_platform_private.h + +void vfio_platform_devtree_get(struct vfio_platform_device *vdev) +{ + vdev-of_node = of_node_get(vdev-pdev-dev.of_node); +} + +void vfio_platform_devtree_put(struct vfio_platform_device *vdev) +{ + of_node_put(vdev-of_node); + vdev-of_node = NULL; +} + +bool vfio_platform_has_devtree(struct vfio_platform_device *vdev) +{ + return !!vdev-of_node; +} + +long vfio_platform_devtree_ioctl(struct vfio_platform_device *vdev, +unsigned long arg) +{ + return -EINVAL; /* not implemented yet */ +} diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c index f4c06c6..e6fe05a 100644 --- a/drivers/vfio/platform/vfio_platform.c +++ b/drivers/vfio/platform/vfio_platform.c @@ -26,6 +26,7 @@ #include linux/vfio.h #include linux/io.h #include linux/platform_device.h +#include linux/of.h #include linux/irq.h #include vfio_platform_private.h @@ -66,6 +67,9 @@ static int vfio_platform_regions_init(struct vfio_platform_device *vdev) vdev-num_regions = cnt; + /* get device tree node for info if available */ + vfio_platform_devtree_get(vdev); + return 0; err: kfree(vdev-region); @@ -74,6 +78,7 @@ err: static void vfio_platform_regions_cleanup(struct vfio_platform_device *vdev) { + vfio_platform_devtree_put(vdev); vdev-num_regions = 0; kfree(vdev-region); } @@ -132,6 +137,9 @@ static long vfio_platform_ioctl(void *device_data, return -EINVAL; info.flags = VFIO_DEVICE_FLAGS_PLATFORM; + if (vfio_platform_has_devtree(vdev)) + info.flags |= VFIO_DEVICE_FLAGS_DEVTREE; + info.num_regions = vdev-num_regions; info.num_irqs = vdev-num_irqs; @@ -210,6 +218,9 @@ static long vfio_platform_ioctl(void *device_data, return ret; + } else if (cmd == VFIO_DEVICE_GET_DEVTREE_INFO) { + return vfio_platform_devtree_ioctl(vdev, arg); + } else if (cmd == VFIO_DEVICE_RESET) return -EINVAL; diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h index 86a9201..1c42ba0 100644 --- a/drivers/vfio/platform/vfio_platform_private.h +++ b/drivers/vfio/platform/vfio_platform_private.h @@ -49,6 +49,7 @@ struct vfio_platform_device { u32 num_regions; struct vfio_platform_irq*irq; u32 num_irqs; + struct device_node *of_node; }; extern int vfio_platform_irq_init(struct vfio_platform_device *vdev); @@ -59,4 +60,10 @@ extern int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev, uint32_t flags, unsigned index, unsigned start, unsigned count, void *data); +/* device tree info support in devtree.c */ +extern void vfio_platform_devtree_get(struct vfio_platform_device *vdev); +extern void vfio_platform_devtree_put(struct vfio_platform_device *vdev); +extern bool vfio_platform_has_devtree(struct vfio_platform_device *vdev); +extern long vfio_platform_devtree_ioctl(struct vfio_platform_device *vdev, + unsigned long arg); #endif /* VFIO_PLATFORM_PRIVATE_H */ diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index d381107..60f66ec 100644 --- a/include/uapi/linux/vfio.h +++
[RFC 0/4] VFIO: PLATFORM: Return device tree info for a platform device node
This RFC's intention is to show what an interface to access device node properties for VFIO_PLATFORM can look like. If a device tree node corresponding to a platform device bound by VFIO_PLATFORM is available, this patch series will allow the user to query the properties associated with this device node. This can be useful for userspace drivers to automatically query parameters related to the device. An API to return data from a device's device tree has been proposed before on these lists. The API proposed here is slightly different. Properties to parse from the device tree are not indexed by a numerical id. The host system doesn't guarantee any specific ordering for the available properties, or that those will remain the same; while this does not happen in practice, there is nothing from the host changing the device nodes during operation. So properties are accessed by property name. The type of the property accessed must also be known by the user. Properties types implemented in this RFC: - VFIO_DEVTREE_ARR_TYPE_STRING (strings sepparated by the null character) - VFIO_DEVTREE_ARR_TYPE_U32 - VFIO_DEVTREE_ARR_TYPE_U16 - VFIO_DEVTREE_ARR_TYPE_U8 These can all be access by the ioctl VFIO_DEVICE_GET_DEVTREE_INFO. A new ioctl was preferred instead of shoehorning the functionality in VFIO_DEVICE_GET_INFO. The structure exchanged looks like this: /** * VFIO_DEVICE_GET_DEVTREE_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 16, * struct vfio_devtree_info) * * Retrieve information from the device's device tree, if available. * Caller will initialize data[] with a single string with the requested * devicetree property name, and type depending on whether a array of strings * or an array of u32 values is expected. On success, data[] will be extended * with the requested information, either as an array of u32, or with a list * of strings sepparated by the NULL terminating character. * Return: 0 on success, -errno on failure. */ struct vfio_devtree_info { __u32 argsz; __u32 type; #define VFIO_DEVTREE_PROP_NAMES 0 #define VFIO_DEVTREE_ARR_TYPE_STRING1 #define VFIO_DEVTREE_ARR_TYPE_U82 #define VFIO_DEVTREE_ARR_TYPE_U16 3 #define VFIO_DEVTREE_ARR_TYPE_U32 4 __u32 length; __u8data[]; }; #define VFIO_DEVICE_GET_DEVTREE_INFO_IO(VFIO_TYPE, VFIO_BASE + 17) The length of the property will be reported in length, so the user can reallocate the structure if the data does not fit the first time the call is used. Specifically for QEMU, reading the compatible property of the device tree node coul be of use to find out what device is being assigned to the guest and handle appropriately a wider range of devices in the future, and to generate an appropriate device tree for the guest. Antonios Motakis (4): VFIO: PLATFORM: Add device tree info API and skeleton VFIO: PLATFORM: DEVTREE: Return available property names VFIO: PLATFORM: DEVTREE: Access property as a list of strings VFIO: PLATFORM: DEVTREE: Return arrays of u32, u16, or u8 drivers/vfio/platform/Makefile| 2 +- drivers/vfio/platform/devtree.c | 252 ++ drivers/vfio/platform/vfio_platform.c | 11 ++ drivers/vfio/platform/vfio_platform_private.h | 7 + include/uapi/linux/vfio.h | 32 +++- 5 files changed, 300 insertions(+), 4 deletions(-) create mode 100644 drivers/vfio/platform/devtree.c -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] VFIO: PLATFORM: DEVTREE: Return arrays of u32, u16, or u8
Certain properties of a device tree node are accessible as an array of unsigned integers, either u32, u16, or u8. Let the VFIO user query this type of device node properties. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com --- drivers/vfio/platform/devtree.c | 99 + 1 file changed, 99 insertions(+) diff --git a/drivers/vfio/platform/devtree.c b/drivers/vfio/platform/devtree.c index 80c60d4..331cc34 100644 --- a/drivers/vfio/platform/devtree.c +++ b/drivers/vfio/platform/devtree.c @@ -98,6 +98,96 @@ static int devtree_get_full_name(struct device_node *np, void __user *datap, return 0; } +static int devtree_get_u32_arr(const struct device_node *np, const char *name, + void __user *datap, unsigned long datasz) +{ + int ret; + int n; + u32 *out; + + n = of_property_count_elems_of_size(np, name, sizeof(u32)); + if (n 0) + return n; + + if (n * sizeof(u32) datasz) + return -EAGAIN; + + out = kcalloc(n, sizeof(u32), GFP_KERNEL); + if (!out) + return -EFAULT; + + ret = of_property_read_u32_array(np, name, out, n); + if (ret) + goto out; + + if (copy_to_user(datap, out, n * sizeof(u32))) + ret = -EFAULT; + +out: + kfree(out); + return ret; +} + +static int devtree_get_u16_arr(const struct device_node *np, const char *name, + void __user *datap, unsigned long datasz) +{ + int ret; + int n; + u16 *out; + + n = of_property_count_elems_of_size(np, name, sizeof(u16)); + if (n 0) + return n; + + if (n * sizeof(u16) datasz) + return -EAGAIN; + + out = kcalloc(n, sizeof(u16), GFP_KERNEL); + if (!out) + return -EFAULT; + + ret = of_property_read_u16_array(np, name, out, n); + if (ret) + goto out; + + if (copy_to_user(datap, out, n * sizeof(u16))) + ret = -EFAULT; + +out: + kfree(out); + return ret; +} + +static int devtree_get_u8_arr(const struct device_node *np, const char *name, + void __user *datap, unsigned long datasz) +{ + int ret; + int n; + u8 *out; + + n = of_property_count_elems_of_size(np, name, sizeof(u8)); + if (n 0) + return n; + + if (n * sizeof(u8) datasz) + return -EAGAIN; + + out = kcalloc(n, sizeof(u8), GFP_KERNEL); + if (!out) + return -EFAULT; + + ret = of_property_read_u8_array(np, name, out, n); + if (ret) + goto out; + + if (copy_to_user(datap, out, n * sizeof(u8))) + ret = -EFAULT; + +out: + kfree(out); + return ret; +} + long vfio_platform_devtree_ioctl(struct vfio_platform_device *vdev, unsigned long arg) { @@ -143,6 +233,15 @@ long vfio_platform_devtree_ioctl(struct vfio_platform_device *vdev, } else if (info.type == VFIO_DEVTREE_ARR_TYPE_STRING) ret = devtree_get_strings(vdev-of_node, name, datap, datasz); + else if (info.type == VFIO_DEVTREE_ARR_TYPE_U32) + ret = devtree_get_u32_arr(vdev-of_node, name, datap, datasz); + + else if (info.type == VFIO_DEVTREE_ARR_TYPE_U16) + ret = devtree_get_u16_arr(vdev-of_node, name, datap, datasz); + + else if (info.type == VFIO_DEVTREE_ARR_TYPE_U8) + ret = devtree_get_u8_arr(vdev-of_node, name, datap, datasz); + kfree(name); out: -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] VFIO: PLATFORM: DEVTREE: Access property as a list of strings
Certain device tree properties (e.g. the device node name, the compatible string), are available as a list of strings (separated by the null terminating character). Let the VFIO user query this type of properties. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com --- drivers/vfio/platform/devtree.c | 60 + 1 file changed, 60 insertions(+) diff --git a/drivers/vfio/platform/devtree.c b/drivers/vfio/platform/devtree.c index b8fd4138..80c60d4 100644 --- a/drivers/vfio/platform/devtree.c +++ b/drivers/vfio/platform/devtree.c @@ -61,6 +61,43 @@ static int devtree_get_prop_names(struct device_node *np, void __user *datap, return ret; } +static int devtree_get_strings(struct device_node *np, char *name, + void __user *datap, unsigned long datasz) +{ + struct property *prop; + int len; + + prop = of_find_property(np, name, len); + + if (!prop) + return -EINVAL; + + if (len datasz) + return -EAGAIN; + + if (copy_to_user(datap, prop-value, len)) + return -EFAULT; + else + return 0; +} + +static int devtree_get_full_name(struct device_node *np, void __user *datap, +unsigned long datasz, int *lenp) +{ + int len = strlen(np-full_name) + 1; + + if (lenp) + *lenp = len; + + if (len datasz) + return -EAGAIN; + + if (copy_to_user(datap, np-full_name, len)) + return -EFAULT; + + return 0; +} + long vfio_platform_devtree_ioctl(struct vfio_platform_device *vdev, unsigned long arg) { @@ -68,6 +105,7 @@ long vfio_platform_devtree_ioctl(struct vfio_platform_device *vdev, unsigned long minsz = offsetofend(struct vfio_devtree_info, length); void __user *datap = (void __user *) arg + minsz; unsigned long int datasz; + char *name; int ret = -EINVAL; if (!vfio_platform_has_devtree(vdev)) @@ -84,8 +122,30 @@ long vfio_platform_devtree_ioctl(struct vfio_platform_device *vdev, if (info.type == VFIO_DEVTREE_PROP_NAMES) { ret = devtree_get_prop_names(vdev-of_node, datap, datasz, info.length); + goto out; } + name = kzalloc(datasz, GFP_KERNEL); + if (!name) + return -ENOMEM; + if (copy_from_user(name, datap, datasz)) + return -EFAULT; + + if (!of_find_property(vdev-of_node, name, info.length)) { + /* special case full_name as a property that is not on the fdt, +* but we wish to return to the user as it includes the full +* path of the device */ + if (!strcmp(name, full_name) + (info.type == VFIO_DEVTREE_ARR_TYPE_STRING)) + ret = devtree_get_full_name(vdev-of_node, datap, + datasz, info.length); + + } else if (info.type == VFIO_DEVTREE_ARR_TYPE_STRING) + ret = devtree_get_strings(vdev-of_node, name, datap, datasz); + + kfree(name); + +out: if (copy_to_user((void __user *)arg, info, minsz)) ret = -EFAULT; -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] VFIO: PLATFORM: DEVTREE: Return available property names
For various reasons, the available properties of the platform device node in the device tree node should be referred to by the property name. Passing type = VFIO_DEVTREE_PROP_NAMES to VFIO_DEVICE_GET_DEVTREE_INFO, returns a list of strings with the available properties that the VFIO user can access. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com --- drivers/vfio/platform/devtree.c | 68 - 1 file changed, 67 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/platform/devtree.c b/drivers/vfio/platform/devtree.c index 91cab88..b8fd4138 100644 --- a/drivers/vfio/platform/devtree.c +++ b/drivers/vfio/platform/devtree.c @@ -20,8 +20,74 @@ bool vfio_platform_has_devtree(struct vfio_platform_device *vdev) return !!vdev-of_node; } +static int devtree_get_prop_names(struct device_node *np, void __user *datap, + unsigned long datasz, int *lenp) +{ + struct property *prop; + int len = 0, sz; + int ret = 0; + + for_each_property_of_node(np, prop) { + sz = strlen(prop-name) + 1; + + if (datasz sz) { + ret = -EAGAIN; + break; + } + + if (copy_to_user(datap, prop-name, sz)) + return -EFAULT; + + datap += sz; + datasz -= sz; + len += sz; + } + + /* if overflow occurs, calculate remaining length */ + while (prop) { + len += strlen(prop-name) + 1; + prop = prop-next; + } + + /* we expose the full_name in addition to the usual properties */ + len += sz = strlen(full_name) + 1; + if (datasz sz) { + ret = -EAGAIN; + } else if (copy_to_user(datap, full_name, sz)) + return -EFAULT; + + *lenp = len; + + return ret; +} + long vfio_platform_devtree_ioctl(struct vfio_platform_device *vdev, unsigned long arg) { - return -EINVAL; /* not implemented yet */ + struct vfio_devtree_info info; + unsigned long minsz = offsetofend(struct vfio_devtree_info, length); + void __user *datap = (void __user *) arg + minsz; + unsigned long int datasz; + int ret = -EINVAL; + + if (!vfio_platform_has_devtree(vdev)) + return -EINVAL; + + if (copy_from_user(info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz minsz) + return -EINVAL; + + datasz = info.argsz - minsz; + + if (info.type == VFIO_DEVTREE_PROP_NAMES) { + ret = devtree_get_prop_names(vdev-of_node, datap, datasz, + info.length); + } + + if (copy_to_user((void __user *)arg, info, minsz)) + ret = -EFAULT; + + return ret; } -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The status about vhost-net on kvm-arm?
On 2014/8/13 17:10, Nikolay Nikolaev wrote: On Tue, Aug 12, 2014 at 6:47 PM, Nikolay Nikolaev n.nikol...@virtualopensystems.com wrote: Hello, On Tue, Aug 12, 2014 at 5:41 AM, Li Liu john.li...@huawei.com wrote: Hi all, Is anyone there can tell the current status of vhost-net on kvm-arm? Half a year has passed from Isa Ansharullah asked this question: http://www.spinics.net/lists/kvm-arm/msg08152.html I have found two patches which have provided the kvm-arm support of eventfd and irqfd: 1) [RFC PATCH 0/4] ARM: KVM: Enable the ioeventfd capability of KVM on ARM http://lists.gnu.org/archive/html/qemu-devel/2014-01/msg01770.html 2) [RFC,v3] ARM: KVM: add irqfd and irq routing support https://patches.linaro.org/32261/ And there's a rough patch for qemu to support eventfd from Ying-Shiuan Pan: [Qemu-devel] [PATCH 0/4] ioeventfd support for virtio-mmio https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html But there no any comments of this patch. And I can found nothing about qemu to support irqfd. Do I lost the track? If nobody try to fix it. We have a plan to complete it about virtio-mmio supporing irqfd and multiqueue. we at Virtual Open Systems did some work and tested vhost-net on ARM back in March. The setup was based on: - host kernel with our ioeventfd patches: http://www.spinics.net/lists/kvm-arm/msg08413.html - qemu with the aforementioned patches from Ying-Shiuan Pan https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html The testbed was ARM Chromebook with Exynos 5250, using a 1Gbps USB3 Ethernet adapter connected to a 1Gbps switch. I can't find the actual numbers but I remember that with multiple streams the gain was clearly seen. Note that it used the minimum required ioventfd implementation and not irqfd. I guess it is feasible to think that it all can be put together and rebased + the recent irqfd work. One can achiev even better performance (because of the irqfd). Managed to replicate the setup with the old versions e used in March: Single stream from another machine to chromebook with 1Gbps USB3 Ethernet adapter. iperf -c address -P 1 -i 1 -p 5001 -f k -t 10 to HOST: 858316 Kbits/sec to GUEST: 761563 Kbits/sec 10 parallel streams iperf -c address -P 10 -i 1 -p 5001 -f k -t 10 to HOST: 842420 Kbits/sec to GUEST: 625144 Kbits/sec Appreciate your work. Is it convenient for you to test the same cases without vhost=on? Then the results will show the improvement of performance clearly only with ioeventfd. I will try to test it with a Hisilicon board which is ongoing. Best regards Li ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm regards, Nikolay Nikolaev Virtual Open Systems . -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Test case for VFIO_PLATFORM returning device tree info
This is a test case complementing the patch series: [RFC 0/4] VFIO: PLATFORM: Return device tree info for a platform device node This test case is based on the ARM PL330 DMA controller and shows how device node properties can be accessed from userspace. It doesn't apply on anything in particular, sent in a patch format on the ML merely for reading convenience. It can be pulled from: g...@github.com:virtualopensystems/vfio-devtree-test.git --- vfio-dt.c | 172 ++ 1 file changed, 172 insertions(+) create mode 100644 vfio-dt.c diff --git a/vfio-dt.c b/vfio-dt.c new file mode 100644 index 000..6e671dd --- /dev/null +++ b/vfio-dt.c @@ -0,0 +1,172 @@ +#include stdio.h +#include sys/fcntl.h +#include sys/mman.h +#include linux/vfio.h +#include sys/eventfd.h +#include stdlib.h +#include unistd.h +#include errno.h +#include string.h + +#define VFIO_DEVICE_FLAGS_DEVTREE (1 3) /* device tree metadata */ + +struct vfio_devtree_info { + __u32 argsz; + __u32 type; +#define VFIO_DEVTREE_PROP_NAMES0 +#define VFIO_DEVTREE_ARR_TYPE_STRING 1 +#define VFIO_DEVTREE_ARR_TYPE_U8 2 +#define VFIO_DEVTREE_ARR_TYPE_U16 3 +#define VFIO_DEVTREE_ARR_TYPE_U32 4 + __u32 length; + __u8data[]; +}; +#define VFIO_DEVICE_GET_DEVTREE_INFO _IO(VFIO_TYPE, VFIO_BASE + 17) + +static void vfio_pr_devtree_prop(int device, char *name, unsigned int type) +{ + static unsigned int length = 0; + static struct vfio_devtree_info *devtree = NULL; + int ret; + + if (length strlen(name) + 1) + length = strlen(name) + 1; + + while (1) { + unsigned int argsz = sizeof(struct vfio_devtree_info) + length; + devtree = realloc(devtree, argsz); + devtree-argsz = argsz; + devtree-type = type; + strcpy(devtree-data, name); + + ret = ioctl(device, VFIO_DEVICE_GET_DEVTREE_INFO, devtree); + + if (length devtree-length) + length = devtree-length; + else + break; + } + + if (ret) { + printf(%s = error %d\n, name, ret); + } else if (type == VFIO_DEVTREE_ARR_TYPE_STRING || + type == VFIO_DEVTREE_PROP_NAMES) { + int i; + printf(%s =, name); + for (i=0; i devtree-length; i += strlen(devtree-data + i) + 1) + printf( \%s\, devtree-data + i); + printf(\n); + + } else if (type == VFIO_DEVTREE_ARR_TYPE_U32 || + type == VFIO_DEVTREE_ARR_TYPE_U16 || + type == VFIO_DEVTREE_ARR_TYPE_U8) { + long unsigned int *uarr = (long unsigned int *) devtree-data; + + printf(%s =, name); + while (uarr devtree-data + devtree-length) { + printf( 0x%lx, *uarr); + uarr++; + } + printf(\n); + } +} + +int main (int argc, char **argv) { + + int container, group, device; + unsigned int i; + + struct vfio_group_status group_status = { .argsz = sizeof(group_status) }; + struct vfio_iommu_type1_info iommu_info = { .argsz = sizeof(iommu_info) }; + struct vfio_iommu_type1_dma_map dma_map = { .argsz = sizeof(dma_map) }; + struct vfio_device_info device_info = { .argsz = sizeof(device_info) }; + + int ret; + + if (argc != 3) { + printf(Usage: ./vfio-dt /dev/vfio/${group_id} device_id\n); + return 2; + } + + /* Create a new container */ + container = open(/dev/vfio/vfio, O_RDWR); + + if (ioctl(container, VFIO_GET_API_VERSION) != VFIO_API_VERSION) { + printf(Unknown API version\n); + return 1; + } + + if (!ioctl(container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) { + printf(Doesn't support the IOMMU driver we want\n); + return 1; + } + + /* Open the group */ + group = open(argv[1], O_RDWR); + + /* Test the group is viable and available */ + ioctl(group, VFIO_GROUP_GET_STATUS, group_status); + + if (!(group_status.flags VFIO_GROUP_FLAGS_VIABLE)) { + printf(Group is not viable (not all devices bound for vfio)\n); + return 1; + } + + /* Add the group to the container */ + ioctl(group, VFIO_GROUP_SET_CONTAINER, container); + + /* Enable the IOMMU model we want */ + ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU); + + /* Get addition IOMMU info */ + ioctl(container, VFIO_IOMMU_GET_INFO, iommu_info); + + /* Get a file descriptor for the device */ + device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, argv[2]); + printf(=== VFIO device file descriptor %d ===\n, device); + + /* Test and setup the
Re: The status about vhost-net on kvm-arm?
On Wed, Aug 13, 2014 at 12:10 PM, Nikolay Nikolaev n.nikol...@virtualopensystems.com wrote: On Tue, Aug 12, 2014 at 6:47 PM, Nikolay Nikolaev n.nikol...@virtualopensystems.com wrote: Hello, On Tue, Aug 12, 2014 at 5:41 AM, Li Liu john.li...@huawei.com wrote: Hi all, Is anyone there can tell the current status of vhost-net on kvm-arm? Half a year has passed from Isa Ansharullah asked this question: http://www.spinics.net/lists/kvm-arm/msg08152.html I have found two patches which have provided the kvm-arm support of eventfd and irqfd: 1) [RFC PATCH 0/4] ARM: KVM: Enable the ioeventfd capability of KVM on ARM http://lists.gnu.org/archive/html/qemu-devel/2014-01/msg01770.html 2) [RFC,v3] ARM: KVM: add irqfd and irq routing support https://patches.linaro.org/32261/ And there's a rough patch for qemu to support eventfd from Ying-Shiuan Pan: [Qemu-devel] [PATCH 0/4] ioeventfd support for virtio-mmio https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html But there no any comments of this patch. And I can found nothing about qemu to support irqfd. Do I lost the track? If nobody try to fix it. We have a plan to complete it about virtio-mmio supporing irqfd and multiqueue. we at Virtual Open Systems did some work and tested vhost-net on ARM back in March. The setup was based on: - host kernel with our ioeventfd patches: http://www.spinics.net/lists/kvm-arm/msg08413.html - qemu with the aforementioned patches from Ying-Shiuan Pan https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html The testbed was ARM Chromebook with Exynos 5250, using a 1Gbps USB3 Ethernet adapter connected to a 1Gbps switch. I can't find the actual numbers but I remember that with multiple streams the gain was clearly seen. Note that it used the minimum required ioventfd implementation and not irqfd. I guess it is feasible to think that it all can be put together and rebased + the recent irqfd work. One can achiev even better performance (because of the irqfd). Managed to replicate the setup with the old versions e used in March: Single stream from another machine to chromebook with 1Gbps USB3 Ethernet adapter. iperf -c address -P 1 -i 1 -p 5001 -f k -t 10 to HOST: 858316 Kbits/sec to GUEST: 761563 Kbits/sec to GUEST vhost=off: 508150 Kbits/sec 10 parallel streams iperf -c address -P 10 -i 1 -p 5001 -f k -t 10 to HOST: 842420 Kbits/sec to GUEST: 625144 Kbits/sec to GUEST vhost=off: 425276 Kbits/sec ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm regards, Nikolay Nikolaev Virtual Open Systems -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] arm64: fix VTTBR_BADDR_MASK
On Tue, Aug 12, 2014 at 06:05:21PM +0200, Christoffer Dall wrote: On Mon, Aug 11, 2014 at 03:38:23PM -0500, Joel Schopp wrote: The current VTTBR_BADDR_MASK only masks 39 bits, which is broken on current systems. Rather than just add a bit it seems like a good time to also set things at run-time instead of compile time to accomodate more hardware. This patch sets TCR_EL2.PS, VTCR_EL2.T0SZ and vttbr_baddr_mask in runtime, not compile time. In ARMv8, EL2 physical address size (TCR_EL2.PS) and stage2 input address size (VTCR_EL2.T0SZE) cannot be determined in compile time since they depend on hardware capability. According to Table D4-23 and Table D4-25 in ARM DDI 0487A.b document, vttbr_x is calculated using different fixed values with consideration of T0SZ, granule size and the level of translation tables. Therefore, vttbr_baddr_mask should be determined dynamically. Changes since v3: Another rebase Addressed minor comments from v2 Changes since v2: Rebased on https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next branch Changes since v1: Rebased fix on Jungseok Lee's patch https://lkml.org/lkml/2014/5/12/189 to provide better long term fix. Updated that patch to log error instead of silently fail on unaligned vttbr. Cc: Christoffer Dall christoffer.d...@linaro.org Cc: Sungjinn Chung sungjinn.ch...@samsung.com Signed-off-by: Jungseok Lee jays@samsung.com Signed-off-by: Joel Schopp joel.sch...@amd.com --- arch/arm/kvm/arm.c | 116 +- arch/arm64/include/asm/kvm_arm.h | 17 +- arch/arm64/kvm/hyp-init.S| 20 +-- 3 files changed, 131 insertions(+), 22 deletions(-) diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 3c82b37..b4859fa 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -37,6 +37,7 @@ #include asm/mman.h #include asm/tlbflush.h #include asm/cacheflush.h +#include asm/cputype.h #include asm/virt.h #include asm/kvm_arm.h #include asm/kvm_asm.h @@ -61,6 +62,8 @@ static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1); static u8 kvm_next_vmid; static DEFINE_SPINLOCK(kvm_vmid_lock); +static u64 vttbr_baddr_mask; + static bool vgic_present; static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu) @@ -412,6 +415,103 @@ static bool need_new_vmid_gen(struct kvm *kvm) return unlikely(kvm-arch.vmid_gen != atomic64_read(kvm_vmid_gen)); } + + + /* +* ARMv8 64K architecture limitations: +* 16 = T0SZ = 21 is valid under 3 level of translation tables +* 18 = T0SZ = 34 is valid under 2 level of translation tables +* 31 = T0SZ = 39 is valid under 1 level of transltaion tables +* +* ARMv8 4K architecture limitations: +* 16 = T0SZ = 24 is valid under 4 level of translation tables +* 21 = T0SZ = 30 is valid under 3 level of translation tables this is still wrong, as I pointed out, it should be 21 = T0SZ = 30 typo: I meant: 21 = T0SZ = 33 -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] PC, KVM, CMA: Fix regression caused by wrong get_order() use
Alexey Kardashevskiy a...@ozlabs.ru writes: fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no functional change but this is not true as it calls get_order() (which takes bytes) where it should have called ilog2() and the kernel stops on VM_BUG_ON(). This replaces get_order() with ilog2(). Should we round it up ? ie, ilog2(kvm_rma_pages - 1) + 1 ? Suggested-by: Paul Mackerras pau...@samba.org Cc: Alexander Graf ag...@suse.de Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Cc: Joonsoo Kim iamjoonsoo@lge.com Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: sta...@vger.kernel.org Why stable ? We merged it this merge window. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 329d7fd..bfe9f01 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma() ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL); if (!ri) return NULL; - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages)); + page = cma_alloc(kvm_cma, kvm_rma_pages, ilog2(kvm_rma_pages)); if (!page) goto err_out; atomic_set(ri-use_count, 1); @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages) { unsigned long align_pages = HPT_ALIGN_PAGES; - VM_BUG_ON(get_order(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); + VM_BUG_ON(ilog2(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); /* Old CPUs require HPT aligned on a multiple of its size */ if (!cpu_has_feature(CPU_FTR_ARCH_206)) align_pages = nr_pages; - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages)); + return cma_alloc(kvm_cma, nr_pages, ilog2(align_pages)); } EXPORT_SYMBOL_GPL(kvm_alloc_hpt); -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: Add polling mode
On Tue, Aug 12, 2014 at 01:57:05PM +0300, Razya Ladelsky wrote: Michael S. Tsirkin m...@redhat.com wrote on 12/08/2014 12:18:50 PM: From: Michael S. Tsirkin m...@redhat.com To: David Miller da...@davemloft.net Cc: Razya Ladelsky/Haifa/IBM@IBMIL, kvm@vger.kernel.org, Alex Glikson/Haifa/IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, Yossi Kuperman1/Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL, abel.gor...@gmail.com, linux-ker...@vger.kernel.org, net...@vger.kernel.org, virtualizat...@lists.linux-foundation.org Date: 12/08/2014 12:18 PM Subject: Re: [PATCH] vhost: Add polling mode On Mon, Aug 11, 2014 at 12:46:21PM -0700, David Miller wrote: From: Michael S. Tsirkin m...@redhat.com Date: Sun, 10 Aug 2014 21:45:59 +0200 On Sun, Aug 10, 2014 at 11:30:35AM +0300, Razya Ladelsky wrote: ... And, did your tests actually produce 100% load on both host CPUs? ... Michael, please do not quote an entire patch just to ask a one line question. I truly, truly, wish it was simpler in modern email clients to delete the unrelated quoted material because I bet when people do this they are simply being lazy. Thank you. Lazy - mea culpa, though I'm using mutt so it isn't even hard. The question still stands: the test results are only valid if CPU was at 100% in all configurations. This is the reason I generally prefer it when people report throughput divided by CPU (power would be good too but it still isn't easy for people to get that number). Hi Michael, Sorry for the delay, had some problems with my mailbox, and I realized just now that my reply wasn't sent. The vm indeed ALWAYS utilized 100% cpu, whether polling was enabled or not. The vhost thread utilized less than 100% (of the other cpu) when polling was disabled. Enabling polling increased its utilization to 100% (in which case both cpus were 100% utilized). Hmm this means the testing wasn't successful then, as you said: The idea was to get it 100% loaded, so we can see that the polling is getting it to produce higher throughput. in fact here you are producing more throughput but spending more power to produce it, which can have any number of explanations besides polling improving the efficiency. For example, increasing system load might disable host power management. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Qemu: Fix eax for cpuid leaf 0x40000000
Il 12/08/2014 21:29, Eduardo Habkost ha scritto: On Tue, Aug 12, 2014 at 09:12:00PM +0200, Paolo Bonzini wrote: Il 12/08/2014 20:55, Eduardo Habkost ha scritto: This makes the CPUID data change under the guest's feet during live-migration. Adding compat code to ensure older machine-types keep the old behavior is necessary, but in this specific case it is mostly harmless because 0x0 is documented as being equivalent to 0x4001. (But I don't know how guests are supposed to behave when they see CPUID[KVM_CPUID_SIGNATURE_NEXT].EAX==0.) The only obvious thing to do would be to treat it as 0x4101. I just want to be sure the guests really do that. If we know guests won't do anything different with the CPUID change, I won't mind having no compat code for this. Considering that only two leaves are defined for KVM, and both are mandatory I don't think current guests have any reason to look at CPUID[KVM_CPUID_SIGNATURE | kvm_base].EAX at all. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: Avoid emulating instructions on #UD mistakenly
Commit d40a6898e5 mistakenly caused instructions which are not marked as EmulateOnUD to be emulated upon #UD exception. The commit caused the check of whether the instruction flags include EmulateOnUD to never be evaluated. As a result instructions whose emulation is broken may be emulated. This fix moves the evaluation of EmulateOnUD so it would be evaluated. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/emulate.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 56657b0..37a83b2 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -4394,6 +4394,9 @@ done_prefixes: ctxt-execute = opcode.u.execute; + if (!(ctxt-d EmulateOnUD) ctxt-ud) + return EMULATION_FAILED; + if (unlikely(ctxt-d (NotImpl|EmulateOnUD|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm))) { /* @@ -4406,9 +4409,6 @@ done_prefixes: if (ctxt-d NotImpl) return EMULATION_FAILED; - if (!(ctxt-d EmulateOnUD) ctxt-ud) - return EMULATION_FAILED; - if (mode == X86EMUL_MODE_PROT64 (ctxt-d Stack)) ctxt-op_bytes = 8; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] arm64: fix VTTBR_BADDR_MASK
On Aug 13, 2014, at 8:33 PM, Christoffer Dall wrote: On Tue, Aug 12, 2014 at 06:05:21PM +0200, Christoffer Dall wrote: On Mon, Aug 11, 2014 at 03:38:23PM -0500, Joel Schopp wrote: The current VTTBR_BADDR_MASK only masks 39 bits, which is broken on current systems. Rather than just add a bit it seems like a good time to also set things at run-time instead of compile time to accomodate more hardware. This patch sets TCR_EL2.PS, VTCR_EL2.T0SZ and vttbr_baddr_mask in runtime, not compile time. In ARMv8, EL2 physical address size (TCR_EL2.PS) and stage2 input address size (VTCR_EL2.T0SZE) cannot be determined in compile time since they depend on hardware capability. According to Table D4-23 and Table D4-25 in ARM DDI 0487A.b document, vttbr_x is calculated using different fixed values with consideration of T0SZ, granule size and the level of translation tables. Therefore, vttbr_baddr_mask should be determined dynamically. Changes since v3: Another rebase Addressed minor comments from v2 Changes since v2: Rebased on https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next branch Changes since v1: Rebased fix on Jungseok Lee's patch https://lkml.org/lkml/2014/5/12/189 to provide better long term fix. Updated that patch to log error instead of silently fail on unaligned vttbr. Cc: Christoffer Dall christoffer.d...@linaro.org Cc: Sungjinn Chung sungjinn.ch...@samsung.com Signed-off-by: Jungseok Lee jays@samsung.com Signed-off-by: Joel Schopp joel.sch...@amd.com --- arch/arm/kvm/arm.c | 116 +- arch/arm64/include/asm/kvm_arm.h | 17 +- arch/arm64/kvm/hyp-init.S| 20 +-- 3 files changed, 131 insertions(+), 22 deletions(-) diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 3c82b37..b4859fa 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -37,6 +37,7 @@ #include asm/mman.h #include asm/tlbflush.h #include asm/cacheflush.h +#include asm/cputype.h #include asm/virt.h #include asm/kvm_arm.h #include asm/kvm_asm.h @@ -61,6 +62,8 @@ static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1); static u8 kvm_next_vmid; static DEFINE_SPINLOCK(kvm_vmid_lock); +static u64 vttbr_baddr_mask; + static bool vgic_present; static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu) @@ -412,6 +415,103 @@ static bool need_new_vmid_gen(struct kvm *kvm) return unlikely(kvm-arch.vmid_gen != atomic64_read(kvm_vmid_gen)); } + + + /* +* ARMv8 64K architecture limitations: +* 16 = T0SZ = 21 is valid under 3 level of translation tables +* 18 = T0SZ = 34 is valid under 2 level of translation tables +* 31 = T0SZ = 39 is valid under 1 level of transltaion tables +* +* ARMv8 4K architecture limitations: +* 16 = T0SZ = 24 is valid under 4 level of translation tables +* 21 = T0SZ = 30 is valid under 3 level of translation tables this is still wrong, as I pointed out, it should be 21 = T0SZ = 30 typo: I meant: 21 = T0SZ = 33 Christoffer is right. The original patch, [1], described the conditions incorrectly. [1]: https://lkml.org/lkml/2014/5/12/189 - Jungseok Lee-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Avoid emulating instructions on #UD mistakenly
Correction: the word “never” in the message is too harsh. Nonetheless, there is a regression bug. I encountered it with “wrfsbase” instruction. Nadav On Aug 13, 2014, at 4:50 PM, Nadav Amit na...@cs.technion.ac.il wrote: Commit d40a6898e5 mistakenly caused instructions which are not marked as EmulateOnUD to be emulated upon #UD exception. The commit caused the check of whether the instruction flags include EmulateOnUD to never be evaluated. As a result instructions whose emulation is broken may be emulated. This fix moves the evaluation of EmulateOnUD so it would be evaluated. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/emulate.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 56657b0..37a83b2 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -4394,6 +4394,9 @@ done_prefixes: ctxt-execute = opcode.u.execute; + if (!(ctxt-d EmulateOnUD) ctxt-ud) + return EMULATION_FAILED; + if (unlikely(ctxt-d (NotImpl|EmulateOnUD|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm))) { /* @@ -4406,9 +4409,6 @@ done_prefixes: if (ctxt-d NotImpl) return EMULATION_FAILED; - if (!(ctxt-d EmulateOnUD) ctxt-ud) - return EMULATION_FAILED; - if (mode == X86EMUL_MODE_PROT64 (ctxt-d Stack)) ctxt-op_bytes = 8; -- 1.9.1 signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On Wed, Aug 13, 2014 at 12:48:41AM -0700, H. Peter Anvin wrote: The proposed arch_get_rng_seed() is not really what it claims to be; it most definitely does not produce seed-grade randomness, instead it seems to be an arch function for best-effort initialization of the entropy pools -- which is fine, it is just something quite different. Without getting into an argument about which definition of seed is correct --- it's certainly confusing and different form the RDSEED usage of the word seed. Do we expect that anyone else besides arch_get_rnd_seed() would actually want to use it? I'd argue no; we want the rest of the kernel to either use get_random_bytes() or prandom_u32(). Given that, maybe we should just call it arch_random_init(), and expect that the only user of this interface would be drivers/char/random.c? - Ted -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
unsubscribe kvm udesh...@binghamton.edu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] ARM: KVM: add irqfd support
On Mon, Aug 04, 2014 at 02:08:22PM +0200, Eric Auger wrote: This patch enables irqfd on ARM. irqfd framework enables to inject a virtual IRQ into a guest upon an eventfd trigger. User-side uses KVM_IRQFD VM ioctl to provide KVM with a kvm_irqfd struct that associates a VM, an eventfd, an IRQ number (aka. the gsi). When an actor signals the eventfd (typically a VFIO platform driver), the kvm irqfd subsystem injects the provided virtual IRQ into the guest. The gsi must correspond to a shared peripheral interrupt (SPI), ie the GIC interrupt ID is gsi+32. Why can't we support PPIs? CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD are turned on. This patch enables CONFIG_ No IRQ routing table is used thanks to Paul Mackerras' patch serie: IRQFD without IRQ routing, enabled for XICS (https://www.mail-archive.com/kvm@vger.kernel.org/msg104478.html) Signed-off-by: Eric Auger eric.au...@linaro.org --- This patch would deprecate the previous patch featuring GSI routing (https://patches.linaro.org/32261/) irqchip.c and irq_comm.c are not used at all. This RFC applies on top of Christoffer Dall's serie arm/arm64: KVM: Various VGIC cleanups and improvements https://lists.cs.columbia.edu/pipermail/kvmarm/2014-June/009979.html All pieces can be found on git://git.linaro.org/people/eric.auger/linux.git branch irqfd_integ_v4 This work was tested with Calxeda Midway xgmac main interrupt with qemu-system-arm and QEMU VFIO platform device. --- Documentation/virtual/kvm/api.txt | 5 +++- arch/arm/include/uapi/asm/kvm.h | 3 +++ arch/arm/kvm/Kconfig | 3 ++- arch/arm/kvm/Makefile | 2 +- arch/arm/kvm/irq.h| 25 ++ virt/kvm/arm/vgic.c | 54 --- 6 files changed, 85 insertions(+), 7 deletions(-) create mode 100644 arch/arm/kvm/irq.h diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 0fe3649..04310d9 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2132,7 +2132,7 @@ into the hash PTE second double word). 4.75 KVM_IRQFD Capability: KVM_CAP_IRQFD -Architectures: x86 s390 +Architectures: x86 s390 arm Type: vm ioctl Parameters: struct kvm_irqfd (in) Returns: 0 on success, -1 on error @@ -2158,6 +2158,9 @@ Note that closing the resamplefd is not sufficient to disable the irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment and need not be specified with KVM_IRQFD_FLAG_DEASSIGN. +On ARM/arm64 the injected must be a shared peripheral interrupt (SPI). +This means the programmed GIC interrupt ID is gsi+32. + 4.76 KVM_PPC_ALLOCATE_HTAB Capability: KVM_CAP_PPC_ALLOC_HTAB diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h index e6ebdd3..3034c66 100644 --- a/arch/arm/include/uapi/asm/kvm.h +++ b/arch/arm/include/uapi/asm/kvm.h @@ -194,6 +194,9 @@ struct kvm_arch_memory_slot { /* Highest supported SPI, from VGIC_NR_IRQS */ #define KVM_ARM_IRQ_GIC_MAX 127 +/* One single KVM irqchip, ie. the VGIC */ +#define KVM_NR_IRQCHIPS 1 + /* PSCI interface */ #define KVM_PSCI_FN_BASE 0x95c1ba5e #define KVM_PSCI_FN(n) (KVM_PSCI_FN_BASE + (n)) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 4be5bb1..7800261 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -24,6 +24,7 @@ config KVM select KVM_MMIO select KVM_ARM_HOST depends on ARM_VIRT_EXT ARM_LPAE !CPU_BIG_ENDIAN + select HAVE_KVM_EVENTFD ---help--- Support hosting virtualized guest machines. You will also need to select one or more of the processor modules below. @@ -55,7 +56,7 @@ config KVM_ARM_MAX_VCPUS config KVM_ARM_VGIC bool KVM support for Virtual GIC depends on KVM_ARM_HOST OF - select HAVE_KVM_IRQCHIP + select HAVE_KVM_IRQFD default y ---help--- Adds support for a hardware assisted, in-kernel GIC emulation. diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile index 789bca9..2fa2f82 100644 --- a/arch/arm/kvm/Makefile +++ b/arch/arm/kvm/Makefile @@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt) AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt) KVM := ../../../virt/kvm -kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o +kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o obj-y += kvm-arm.o init.o interrupts.o obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o diff --git a/arch/arm/kvm/irq.h b/arch/arm/kvm/irq.h new file mode 100644 index 000..1275d91 --- /dev/null +++ b/arch/arm/kvm/irq.h @@ -0,0 +1,25 @@ +/* + * Copyright (C) 2014 Linaro Ltd. + * Authors: Eric Auger eric.au...@linaro.org + * + * This program is free software; you can redistribute it and/or
[GIT PULL] VFIO updates for 3.17-rc1
Hi Linus, The following changes since commit 7f0d32e0c1a7a23216a0f2694ec841f60e9dddfd: Merge tag 'microblaze-3.17-rc1' of git://git.monstr.eu/linux-2.6-microblaze (2014-08-07 09:02:26 -0700) are available in the git repository at: git://github.com/awilliam/linux-vfio.git tags/vfio-v3.17-rc1 for you to fetch changes up to 9b936c960f22954bfb89f2fefd8f96916bb42908: drivers/vfio: Enable VFIO if EEH is not supported (2014-08-08 10:39:16 -0600) VFIO updates for v3.17-rc1 - Enable support for bus reset on device release - Fixes for EEH support Alex Williamson (3): vfio-pci: Release devices with BusMaster disabled vfio-pci: Use mutex around open, release, and remove vfio-pci: Attempt bus/slot reset on release Alexey Kardashevskiy (2): drivers/vfio: Allow EEH to be built as module drivers/vfio: Enable VFIO if EEH is not supported Gavin Shan (1): drivers/vfio: Fix EEH build error drivers/vfio/Kconfig| 6 ++ drivers/vfio/Makefile | 2 +- drivers/vfio/pci/vfio_pci.c | 161 drivers/vfio/pci/vfio_pci_private.h | 3 +- drivers/vfio/vfio_spapr_eeh.c | 17 +++- include/linux/vfio.h| 6 +- 6 files changed, 170 insertions(+), 25 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] KVM: SVM: add rdmsr support for AMD event registers
Current KVM only supports RDMSR for K7_EVNTSEL0 and K7_EVNTSEL0 MSRs. Reading the rest MSRs will trigger KVM to inject #GP into guest VM. This causes a warning message Failed to access perfctr msr (MSR c0010001 is ) on AMD host. This patch adds RDMSR support for all K7_EVNTSELn and K7_EVNTSELn registers and thus supresses the warning message. Signed-off-by: Wei Huang wehu...@redhat.com --- arch/x86/kvm/x86.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ef432f8..3f10ca2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2399,7 +2399,13 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_K7_HWCR: case MSR_VM_HSAVE_PA: case MSR_K7_EVNTSEL0: + case MSR_K7_EVNTSEL1: + case MSR_K7_EVNTSEL2: + case MSR_K7_EVNTSEL3: case MSR_K7_PERFCTR0: + case MSR_K7_PERFCTR1: + case MSR_K7_PERFCTR2: + case MSR_K7_PERFCTR3: case MSR_K8_INT_PENDING_MSG: case MSR_AMD64_NB_CFG: case MSR_FAM10H_MMIO_CONF_BASE: -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: SVM: add rdmsr support for AMD event registers
Wrong one, sorry. Please discard this one. Updated one will follow. -Wei On 08/13/2014 10:58 AM, Wei Huang wrote: Current KVM only supports RDMSR for K7_EVNTSEL0 and K7_EVNTSEL0 MSRs. Reading the rest MSRs will trigger KVM to inject #GP into guest VM. This causes a warning message Failed to access perfctr msr (MSR c0010001 is ) on AMD host. This patch adds RDMSR support for all K7_EVNTSELn and K7_EVNTSELn registers and thus supresses the warning message. Signed-off-by: Wei Huang wehu...@redhat.com --- arch/x86/kvm/x86.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ef432f8..3f10ca2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2399,7 +2399,13 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_K7_HWCR: case MSR_VM_HSAVE_PA: case MSR_K7_EVNTSEL0: + case MSR_K7_EVNTSEL1: + case MSR_K7_EVNTSEL2: + case MSR_K7_EVNTSEL3: case MSR_K7_PERFCTR0: + case MSR_K7_PERFCTR1: + case MSR_K7_PERFCTR2: + case MSR_K7_PERFCTR3: case MSR_K8_INT_PENDING_MSG: case MSR_AMD64_NB_CFG: case MSR_FAM10H_MMIO_CONF_BASE: -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] KVM: SVM: add rdmsr support for AMD event registers
Current KVM only supports RDMSR for K7_EVNTSEL0 and K7_PERFCTR0 MSRs. Reading the rest MSRs will trigger KVM to inject #GP into guest VM. This causes a warning message Failed to access perfctr msr (MSR c0010001 is ) on AMD host. This patch adds RDMSR support for all K7_EVNTSELn and K7_PERFCTRn registers and thus supresses the warning message. Signed-off-by: Wei Huang wehu...@redhat.com --- arch/x86/kvm/x86.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ef432f8..3f10ca2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2399,7 +2399,13 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_K7_HWCR: case MSR_VM_HSAVE_PA: case MSR_K7_EVNTSEL0: + case MSR_K7_EVNTSEL1: + case MSR_K7_EVNTSEL2: + case MSR_K7_EVNTSEL3: case MSR_K7_PERFCTR0: + case MSR_K7_PERFCTR1: + case MSR_K7_PERFCTR2: + case MSR_K7_PERFCTR3: case MSR_K8_INT_PENDING_MSG: case MSR_AMD64_NB_CFG: case MSR_FAM10H_MMIO_CONF_BASE: -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On Wed, Aug 13, 2014 at 7:32 AM, Theodore Ts'o ty...@mit.edu wrote: On Wed, Aug 13, 2014 at 12:48:41AM -0700, H. Peter Anvin wrote: The proposed arch_get_rng_seed() is not really what it claims to be; it most definitely does not produce seed-grade randomness, instead it seems to be an arch function for best-effort initialization of the entropy pools -- which is fine, it is just something quite different. Without getting into an argument about which definition of seed is correct --- it's certainly confusing and different form the RDSEED usage of the word seed. Do we expect that anyone else besides arch_get_rnd_seed() would actually want to use it? If you mean random.c instead of arch_get_rnd_seed, then I don't expect there to be other users. Aside from the best-effort bit causing this to be basically useless on old bare metal, the interface is really awkward for anything other than the use in random.c. I'd argue no; we want the rest of the kernel to either use get_random_bytes() or prandom_u32(). Given that, maybe we should just call it arch_random_init(), and expect that the only user of this interface would be drivers/char/random.c? Sounds good to me. FWIW, I'd like to see a second use added in random.c: I think that we should do this, or even all of init_std_data, on resume from suspend and especially on resume from hibernate / kexec. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Regression problem with commit 5045b46803
Commit 5045b46803 added a check that cs.dpl equals cs.rpl during task-switch. Unfortunately, is causes some of my tests that run well on bare-metal to fail. Although this check is mentioned in table 7-1 of the SDM as causing a #TSS exception, it is not mentioned in table 6-6 that lists invalid TSS conditions which cause #TSS exceptions. Thus, I recommend on reverting commit 5045b46803, or alternatively rechecking task-switch behavior on bare-metal. Nadav -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On 08/13/2014 09:13 AM, Andy Lutomirski wrote: Sounds good to me. FWIW, I'd like to see a second use added in random.c: I think that we should do this, or even all of init_std_data, on resume from suspend and especially on resume from hibernate / kexec. Yes, we should. We also need to make it possible to do this after cloning a VM. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On Wed, Aug 13, 2014 at 10:45:25AM -0700, H. Peter Anvin wrote: On 08/13/2014 09:13 AM, Andy Lutomirski wrote: Sounds good to me. FWIW, I'd like to see a second use added in random.c: I think that we should do this, or even all of init_std_data, on resume from suspend and especially on resume from hibernate / kexec. Yes, we should. We also need to make it possible to do this after cloning a VM. Agreed. Can you send a patch? I can carry the commits to add arch_random_init() the generic version, and the patch to call it after suspend/resume. I'll do this at the very head of the random tree, and make sure it gets pushed to Linus early during the next merge window. Does that sound like a plan? Or does someone want to suggest something different? I'm flexible... - Ted -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On Wed, Aug 13, 2014 at 11:22 AM, Theodore Ts'o ty...@mit.edu wrote: On Wed, Aug 13, 2014 at 10:45:25AM -0700, H. Peter Anvin wrote: On 08/13/2014 09:13 AM, Andy Lutomirski wrote: Sounds good to me. FWIW, I'd like to see a second use added in random.c: I think that we should do this, or even all of init_std_data, on resume from suspend and especially on resume from hibernate / kexec. Yes, we should. We also need to make it possible to do this after cloning a VM. Agreed. Can you send a patch? I can carry the commits to add arch_random_init() the generic version, and the patch to call it after suspend/resume. I'll do this at the very head of the random tree, and make sure it gets pushed to Linus early during the next merge window. Does that sound like a plan? Or does someone want to suggest something different? I'm flexible... OK. Here's a proposal. I'll split the series into two parts. The first part will be the arch_random_init generic change and code to call it after suspend/resume (once I figure out the right callback). I'll send that to you. The second part will be the KVM and x86 code, which will look just like this version (v5) except for the rename. If needed, hpa and I can hash the details we need at KS. As for doing arch_random_init after clone/migration, I think we'll need another KVM extension for that, since, AFAIK, we don't actually get notified that we were cloned or migrated. That will be nontrivial. Maybe we can figure that out at KS, too. --Andy - Ted -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On 08/13/2014 11:33 AM, Andy Lutomirski wrote: As for doing arch_random_init after clone/migration, I think we'll need another KVM extension for that, since, AFAIK, we don't actually get notified that we were cloned or migrated. That will be nontrivial. Maybe we can figure that out at KS, too. We don't need a reset when migrated (although it might be a good idea under some circumstances, i.e. if the pools might somehow have gotten exposed) but definitely when cloned. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] x86: Reset MTRR on vCPU reset
The SDM specifies (June 2014 Vol3 11.11.5): On a hardware reset, the P6 and more recent processors clear the valid flags in variable-range MTRRs and clear the E flag in the IA32_MTRR_DEF_TYPE MSR to disable all MTRRs. All other bits in the MTRRs are undefined. We currently do none of that, so whatever MTRR settings you had prior to reset is what you have after reset. Usually this doesn't matter because KVM often ignores the guest mappings and uses write-back anyway. However, if you have an assigned device and an IOMMU that allows NoSnoop for that device, KVM defers to the guest memory mappings which are now stale after reset. The result is that OVMF rebooting on such a configuration takes a full minute to LZMA decompress the EFI volume, a process that is nearly instant on the initial boot. Add support for reseting the SDM defined bits on vCPU reset. Also, by my count we're already in danger of overflowing the entries array that we pass to KVM, so I've topped it up for a bit of headroom. Signed-off-by: Alex Williamson alex.william...@redhat.com Cc: qemu-sta...@nongnu.org --- target-i386/cpu.c |6 ++ target-i386/cpu.h |4 target-i386/kvm.c | 14 +- 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 6d008ab..b5ae654 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -2588,6 +2588,12 @@ static void x86_cpu_reset(CPUState *s) env-xcr0 = 1; +/* MTRR init - Clear global enable bit and valid bit in each variable reg */ +env-mtrr_deftype = ~MSR_MTRRdefType_Enable; +for (i = 0; i MSR_MTRRcap_VCNT; i++) { +env-mtrr_var[i].mask = ~MSR_MTRRphysMask_Valid; +} + #if !defined(CONFIG_USER_ONLY) /* We hard-wire the BSP to the first CPU. */ if (s-cpu_index == 0) { diff --git a/target-i386/cpu.h b/target-i386/cpu.h index e634d83..139890f 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -337,6 +337,8 @@ #define MSR_MTRRphysBase(reg) (0x200 + 2 * (reg)) #define MSR_MTRRphysMask(reg) (0x200 + 2 * (reg) + 1) +#define MSR_MTRRphysMask_Valid (1 11) + #define MSR_MTRRfix64K_00x250 #define MSR_MTRRfix16K_80x258 #define MSR_MTRRfix16K_A0x259 @@ -353,6 +355,8 @@ #define MSR_MTRRdefType 0x2ff +#define MSR_MTRRdefType_Enable (1 11) + #define MSR_CORE_PERF_FIXED_CTR00x309 #define MSR_CORE_PERF_FIXED_CTR10x30a #define MSR_CORE_PERF_FIXED_CTR20x30b diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 097fe11..cb31338 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -79,6 +79,7 @@ static int lm_capable_kernel; static bool has_msr_hv_hypercall; static bool has_msr_hv_vapic; static bool has_msr_hv_tsc; +static bool has_msr_mtrr; static bool has_msr_architectural_pmu; static uint32_t num_architectural_pmu_counters; @@ -739,6 +740,10 @@ int kvm_arch_init_vcpu(CPUState *cs) env-kvm_xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave)); } +if (env-features[FEAT_1_EDX] CPUID_MTRR) { +has_msr_mtrr = true; +} + return 0; } @@ -1183,7 +1188,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level) CPUX86State *env = cpu-env; struct { struct kvm_msrs info; -struct kvm_msr_entry entries[100]; +struct kvm_msr_entry entries[128]; } msr_data; struct kvm_msr_entry *msrs = msr_data.entries; int n = 0, i; @@ -1278,6 +1283,13 @@ static int kvm_put_msrs(X86CPU *cpu, int level) kvm_msr_entry_set(msrs[n++], HV_X64_MSR_REFERENCE_TSC, env-msr_hv_tsc); } +if (has_msr_mtrr) { +kvm_msr_entry_set(msrs[n++], MSR_MTRRdefType, env-mtrr_deftype); +for (i = 0; i MSR_MTRRcap_VCNT; i++) { +kvm_msr_entry_set(msrs[n++], + MSR_MTRRphysMask(i), env-mtrr_var[i].mask); +} +} /* Note: MSR_IA32_FEATURE_CONTROL is written separately, see * kvm_put_msr_feature_control. */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86: Reset MTRR on vCPU reset
a number of comments -- feel free to address or ignore each as you see fit: On 08/13/14 21:09, Alex Williamson wrote: The SDM specifies (June 2014 Vol3 11.11.5): On a hardware reset, the P6 and more recent processors clear the valid flags in variable-range MTRRs and clear the E flag in the IA32_MTRR_DEF_TYPE MSR to disable all MTRRs. All other bits in the MTRRs are undefined. We currently do none of that, so whatever MTRR settings you had prior to reset is what you have after reset. Usually this doesn't matter because KVM often ignores the guest mappings and uses write-back anyway. However, if you have an assigned device and an IOMMU that allows NoSnoop for that device, KVM defers to the guest memory mappings which are now stale after reset. The result is that OVMF rebooting on such a configuration takes a full minute to LZMA decompress the EFI volume, a process that is nearly instant on the For pedantry, instead of EFI volume we could say LZMA-compressed Firmware File System file in the FVMAIN_COMPACT firmware volume. initial boot. Add support for reseting the SDM defined bits on vCPU reset. Also, by my count we're already in danger of overflowing the entries array that we pass to KVM, so I've topped it up for a bit of headroom. Signed-off-by: Alex Williamson alex.william...@redhat.com Cc: qemu-sta...@nongnu.org --- target-i386/cpu.c |6 ++ target-i386/cpu.h |4 target-i386/kvm.c | 14 +- 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 6d008ab..b5ae654 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -2588,6 +2588,12 @@ static void x86_cpu_reset(CPUState *s) env-xcr0 = 1; +/* MTRR init - Clear global enable bit and valid bit in each variable reg */ +env-mtrr_deftype = ~MSR_MTRRdefType_Enable; +for (i = 0; i MSR_MTRRcap_VCNT; i++) { +env-mtrr_var[i].mask = ~MSR_MTRRphysMask_Valid; +} + I can see that the limit, MSR_MTRRcap_VCNT, is #defined as 8. Would you be willing to update the definition of the CPUX86State.mtrr_var array too, in target-i386/cpu.h? Currently it says: MTRRVar mtrr_var[8]; #if !defined(CONFIG_USER_ONLY) /* We hard-wire the BSP to the first CPU. */ if (s-cpu_index == 0) { diff --git a/target-i386/cpu.h b/target-i386/cpu.h index e634d83..139890f 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -337,6 +337,8 @@ #define MSR_MTRRphysBase(reg) (0x200 + 2 * (reg)) #define MSR_MTRRphysMask(reg) (0x200 + 2 * (reg) + 1) +#define MSR_MTRRphysMask_Valid (1 11) + Note: a signed integer (int32_t). #define MSR_MTRRfix64K_00x250 #define MSR_MTRRfix16K_80x258 #define MSR_MTRRfix16K_A0x259 @@ -353,6 +355,8 @@ #define MSR_MTRRdefType 0x2ff +#define MSR_MTRRdefType_Enable (1 11) + Note: a signed integer (int32_t). Now, if you scroll back to the bit-clearing in x86_cpu_reset(), you see ~MSR_MTRRdefType_Enable and ~MSR_MTRRphysMask_Valid These expressions evaluate to negative int (int32_t) values (because the bit-neg sets their sign bits). Due to two's complement (which we are allowed to assume in qemu, see HACKING), the negative int32_t values will be just correct for the next step, when they are converted to uint64_t for the bit-ands, as part of the usual arithmetic conversions. (env-mtrr_deftype and env-mtrr_var[i].mask are uint64_t.) Mathematically this means an addition of UINT64_MAX+1. (Sign extended.) In general, even though they are correct due to two's complement, I dislike such detours into negative-valued signed integers by way of bit-neg, because people are mostly unaware of them and assume they just work. My preferred solution would be #define MSR_MTRRphysMask_Valid (1ull 11) #define MSR_MTRRdefType_Enable (1ull 11) Feel free to ignore this of course. #define MSR_CORE_PERF_FIXED_CTR00x309 #define MSR_CORE_PERF_FIXED_CTR10x30a #define MSR_CORE_PERF_FIXED_CTR20x30b diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 097fe11..cb31338 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -79,6 +79,7 @@ static int lm_capable_kernel; static bool has_msr_hv_hypercall; static bool has_msr_hv_vapic; static bool has_msr_hv_tsc; +static bool has_msr_mtrr; static bool has_msr_architectural_pmu; static uint32_t num_architectural_pmu_counters; @@ -739,6 +740,10 @@ int kvm_arch_init_vcpu(CPUState *cs) env-kvm_xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave)); } +if (env-features[FEAT_1_EDX] CPUID_MTRR) { +has_msr_mtrr = true; +} + Seems to match MTRR Feature Identification in my (older) copy of the SDM. return 0; } @@ -1183,7 +1188,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level) CPUX86State *env = cpu-env;
Re: Logging Information
Hello, I am exploring ideas for clients in cloud to be able to implement functions where there could verify the services offered by the cloud provider like metering services. Idea is I am using the concept of write execute protection scheme. And also I am using TamperEvident Log. I am making use of WP bit to protect pagetable entries so that any modifications is captured in the log. Code pages of the log are also read only and hence any modifications to it is also captured. My questions are: What are the important events that one needs to log so that one could have reasonable overhead? Currently, I have large overhead since any update to page table/modifications creates a trap and in cloud, this is huge. How can one create tamperevident logging mechanism? How could client and the provider verify that each events are logged as intended without a miss. How can one create a logging mechanism (say per client basis). In that case, if required we could replay the log so that we could capture the malicious event. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
What to log in case of untrusted hypervisor
Hello, I am working on testbed executing some secure applications on untrusted hypervisor (in my case kvm). In order to verify the run time integrity of applications,I am using an idea based on write xor execute protection protecting any of the page table updates of hypervisoruser code/data using WP bit making it read only. I am capturing the request in the handler,temporarily making it write,log and then make it read only again. I am also using tamper-evident logging mechanism to log any events related to it. I have a few questions. 1. What are the ideal events that one needs to log so that if one needs to replay the log,he can do so to verify. 2. How can one create tamper-evident logging mechanism? How could client and the provider verify that each events are logged as intended without a miss. 3.How can one create a logging mechanism (say per client basis). In that case, if required we could replay the log so that we could capture the malicious event. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86: Reset MTRR on vCPU reset
On Wed, 2014-08-13 at 22:33 +0200, Laszlo Ersek wrote: a number of comments -- feel free to address or ignore each as you see fit: On 08/13/14 21:09, Alex Williamson wrote: The SDM specifies (June 2014 Vol3 11.11.5): On a hardware reset, the P6 and more recent processors clear the valid flags in variable-range MTRRs and clear the E flag in the IA32_MTRR_DEF_TYPE MSR to disable all MTRRs. All other bits in the MTRRs are undefined. We currently do none of that, so whatever MTRR settings you had prior to reset is what you have after reset. Usually this doesn't matter because KVM often ignores the guest mappings and uses write-back anyway. However, if you have an assigned device and an IOMMU that allows NoSnoop for that device, KVM defers to the guest memory mappings which are now stale after reset. The result is that OVMF rebooting on such a configuration takes a full minute to LZMA decompress the EFI volume, a process that is nearly instant on the For pedantry, instead of EFI volume we could say LZMA-compressed Firmware File System file in the FVMAIN_COMPACT firmware volume. Can you come up with something with maybe half that many words? And also, does it matter? I want someone using OVMF and experiencing a long reboot delay to know that this might fix their problem. Noting that the major time consuming stall is in the LZMA decompression code helps to rationalize why the mapping change is important. The specific blob of data that's being decompressed seems mostly irrelevant, which is why I only gave it 2 words. initial boot. Add support for reseting the SDM defined bits on vCPU reset. Also, by my count we're already in danger of overflowing the entries array that we pass to KVM, so I've topped it up for a bit of headroom. Signed-off-by: Alex Williamson alex.william...@redhat.com Cc: qemu-sta...@nongnu.org --- target-i386/cpu.c |6 ++ target-i386/cpu.h |4 target-i386/kvm.c | 14 +- 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 6d008ab..b5ae654 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -2588,6 +2588,12 @@ static void x86_cpu_reset(CPUState *s) env-xcr0 = 1; +/* MTRR init - Clear global enable bit and valid bit in each variable reg */ +env-mtrr_deftype = ~MSR_MTRRdefType_Enable; +for (i = 0; i MSR_MTRRcap_VCNT; i++) { +env-mtrr_var[i].mask = ~MSR_MTRRphysMask_Valid; +} + I can see that the limit, MSR_MTRRcap_VCNT, is #defined as 8. Would you be willing to update the definition of the CPUX86State.mtrr_var array too, in target-i386/cpu.h? Currently it says: I was tempted to do that, but I was hoping there was some deeper reasoning why these were already defined separately. For instance, what if we wanted to keep a stable vmstate size, but expose fewer variable MTRRs to the guest. MSR_MTRRcap_VCNT is the number exposed to the guest, so it makes sense that we only need to clear the valid bits on those. As I look through the commits that got us here, that was probably just wishful thinking. MTRRVar mtrr_var[8]; #if !defined(CONFIG_USER_ONLY) /* We hard-wire the BSP to the first CPU. */ if (s-cpu_index == 0) { diff --git a/target-i386/cpu.h b/target-i386/cpu.h index e634d83..139890f 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -337,6 +337,8 @@ #define MSR_MTRRphysBase(reg) (0x200 + 2 * (reg)) #define MSR_MTRRphysMask(reg) (0x200 + 2 * (reg) + 1) +#define MSR_MTRRphysMask_Valid (1 11) + Note: a signed integer (int32_t). #define MSR_MTRRfix64K_00x250 #define MSR_MTRRfix16K_80x258 #define MSR_MTRRfix16K_A0x259 @@ -353,6 +355,8 @@ #define MSR_MTRRdefType 0x2ff +#define MSR_MTRRdefType_Enable (1 11) + Note: a signed integer (int32_t). Now, if you scroll back to the bit-clearing in x86_cpu_reset(), you see ~MSR_MTRRdefType_Enable and ~MSR_MTRRphysMask_Valid These expressions evaluate to negative int (int32_t) values (because the bit-neg sets their sign bits). Due to two's complement (which we are allowed to assume in qemu, see HACKING), the negative int32_t values will be just correct for the next step, when they are converted to uint64_t for the bit-ands, as part of the usual arithmetic conversions. (env-mtrr_deftype and env-mtrr_var[i].mask are uint64_t.) Mathematically this means an addition of UINT64_MAX+1. (Sign extended.) In general, even though they are correct due to two's complement, I dislike such detours into negative-valued signed integers by way of bit-neg, because people are mostly unaware of them and assume they just work. My preferred solution would be #define MSR_MTRRphysMask_Valid (1ull 11) #define
Re: [PATCH] x86: Reset MTRR on vCPU reset
On 08/14/14 00:06, Alex Williamson wrote: On Wed, 2014-08-13 at 22:33 +0200, Laszlo Ersek wrote: a number of comments -- feel free to address or ignore each as you see fit: On 08/13/14 21:09, Alex Williamson wrote: mappings which are now stale after reset. The result is that OVMF rebooting on such a configuration takes a full minute to LZMA decompress the EFI volume, a process that is nearly instant on the For pedantry, instead of EFI volume we could say LZMA-compressed Firmware File System file in the FVMAIN_COMPACT firmware volume. Can you come up with something with maybe half that many words? Firmware volume then. Firmware volume is not a generic term, it's a specific term in the Platform Initialization (PI) spec. And also, does it matter? No. :) I want someone using OVMF and experiencing a long reboot delay to know that this might fix their problem. Noting that the major time consuming stall is in the LZMA decompression code helps to rationalize why the mapping change is important. The specific blob of data that's being decompressed seems mostly irrelevant, which is why I only gave it 2 words. Fair enough, it's just that EFI volume doesn't mean anything specific (to me), while firmware volume does. @@ -1183,7 +1188,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level) CPUX86State *env = cpu-env; struct { struct kvm_msrs info; -struct kvm_msr_entry entries[100]; +struct kvm_msr_entry entries[128]; } msr_data; struct kvm_msr_entry *msrs = msr_data.entries; int n = 0, i; @@ -1278,6 +1283,13 @@ static int kvm_put_msrs(X86CPU *cpu, int level) kvm_msr_entry_set(msrs[n++], HV_X64_MSR_REFERENCE_TSC, env-msr_hv_tsc); } +if (has_msr_mtrr) { +kvm_msr_entry_set(msrs[n++], MSR_MTRRdefType, env-mtrr_deftype); +for (i = 0; i MSR_MTRRcap_VCNT; i++) { +kvm_msr_entry_set(msrs[n++], + MSR_MTRRphysMask(i), env-mtrr_var[i].mask); +} +} /* Note: MSR_IA32_FEATURE_CONTROL is written separately, see * kvm_put_msr_feature_control. */ I think that this code is correct (and sufficient for the reset problem), but I'm uncertain if it's complete: (a) Shouldn't you put the matching PhysBase registers as well (for the variable range ones)? Plus, shouldn't you put mtrr_fixed[11] too (MSR_MTRRfix64K_0, ...)? If my change wasn't isolated to the reset portion of kvm_put_msrs() then I would agree with you. But since it is, all of those registers are undefined by the SDM. That's a good way to express your point indeed, and a good way to formulate my concern: I'm not sure your change is isolated to the reset portion. The check that gates the new hunk says level = KVM_PUT_RESET_STATE and a higher level than that does exist: KVM_PUT_FULL_STATE, which is used in incoming migration. (b) You only modify kvm_put_msrs(). What about kvm_get_msrs()? I can see that you make the msr putting dependent on: /* * The following MSRs have side effects on the guest or are too * heavy for normal writeback. Limit them to reset or full state * updates. */ if (level = KVM_PUT_RESET_STATE) { But that's probably not your reason for omitting matching new code from kvm_get_msrs(): HV_X64_MSR_REFERENCE_TSC is also heavy-weight (visible in your patch's context), but that one is nevertheless handled in kvm_get_msrs(). My only reason for (b) is simply symmetry. For example, commit 48a5f3bc added HV_X64_MSR_REFERENCE_TSC at once to both put() and get(). According to target-i386/machine.c, mtrr_deftype and co. are even migrated (part of vmstate), so this asymmetry could become a problem in migration. Eg. source host doesn't fetch MTRR state from KVM, hence wire format carries garbage, but on the target you put (part of) that garbage (right now, just the mask) back into KVM: do_savevm() qemu_savevm_state() qemu_savevm_state_complete() cpu_synchronize_all_states() cpu_synchronize_state() kvm_cpu_synchronize_state() do_kvm_cpu_synchronize_state() kvm_arch_get_registers() kvm_get_msrs() do_loadvm() load_vmstate() qemu_loadvm_state() cpu_synchronize_all_post_init() cpu_synchronize_post_init() kvm_cpu_synchronize_post_init() kvm_arch_put_registers(..., KVM_PUT_FULL_STATE) kvm_put_msrs(..., KVM_PUT_FULL_STATE) /* state subset modified during VCPU reset */ #define KVM_PUT_RESET_STATE 2 /* full state set, modified during initialization or on vmload */ #define KVM_PUT_FULL_STATE 3 Hence I suspect (a) and (b) should be handled. ... And then we arrive at cross-version migration, where both source and target hosts support MTRR, but the source qemu sends
Re: [PATCH v2] KVM: x86: check ISR and TMR to construct eoi exit bitmap
Hi Wei, On Thu, Aug 14, 2014 at 03:16:25AM +0800, Wei Wang wrote: From: Yang Zhang yang.z.zh...@intel.com Guest may mask the IOAPIC entry before issue EOI. In such case, EOI will not be intercepted by hypervisor due to the corrensponding bit in eoi exit bitmap is not setting. The solution is to check ISR + TMR to construct the EOI exit bitmap. This patch is a better fixing for the issue that commit 0f6c0a740b tries to solve. I think you miss the changlog. Regards, Wanpeng Li Tested-by: Alex Williamson alex.william...@redhat.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com Signed-off-by: Wei Wang wei.w.w...@intel.com --- arch/x86/kvm/lapic.c | 17 + arch/x86/kvm/lapic.h |2 ++ arch/x86/kvm/x86.c |9 + virt/kvm/ioapic.c|7 --- 4 files changed, 32 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 08e8a89..0ed4bcb 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -515,6 +515,23 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu) __clear_bit(KVM_APIC_PV_EOI_PENDING, vcpu-arch.apic_attention); } +void kvm_apic_zap_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, + u32 *tmr) +{ + u32 i, reg_off, intr_in_service; + struct kvm_lapic *apic = vcpu-arch.apic; + + for (i = 0; i 8; i++) { + reg_off = 0x10 * i; + intr_in_service = apic_read_reg(apic, APIC_ISR + reg_off) + kvm_apic_get_reg(apic, APIC_TMR + reg_off); + if (intr_in_service) { + *((u32 *)eoi_exit_bitmap + i) |= intr_in_service; + tmr[i] |= intr_in_service; + } + } +} + void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr) { struct kvm_lapic *apic = vcpu-arch.apic; diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 6a11845..4ee3d70 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -53,6 +53,8 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value); u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu); void kvm_apic_set_version(struct kvm_vcpu *vcpu); +void kvm_apic_zap_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, + u32 *tmr); void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr); void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir); int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 204422d..755b556 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6005,6 +6005,15 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) memset(tmr, 0, 32); kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr); + /* + * Guest may mask the IOAPIC entry before issue EOI. In such case, + * EOI will not be intercepted by hypervisor due to the corrensponding + * bit in eoi exit bitmap is not setting. + * + * The solution is to check ISR + TMR to construct the EOI exit bitmap. + */ + kvm_apic_zap_eoi_exitmap(vcpu, eoi_exit_bitmap, tmr); + kvm_x86_ops-load_eoi_exitmap(vcpu, eoi_exit_bitmap); kvm_apic_update_tmr(vcpu, tmr); } diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index e8ce34c..2458a1d 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -254,9 +254,10 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, spin_lock(ioapic-lock); for (index = 0; index IOAPIC_NUM_PINS; index++) { e = ioapic-redirtbl[index]; - if (e-fields.trig_mode == IOAPIC_LEVEL_TRIG || - kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, index) || - index == RTC_GSI) { + if (!e-fields.mask + (e-fields.trig_mode == IOAPIC_LEVEL_TRIG || + kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, + index) || index == RTC_GSI)) { if (kvm_apic_match_dest(vcpu, NULL, 0, e-fields.dest_id, e-fields.dest_mode)) { __set_bit(e-fields.vector, -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86: Reset MTRR on vCPU reset
On 08/14/14 01:17, Laszlo Ersek wrote: - With KVM, the lack of loading MTRR state from KVM, combined with the (partial) storing of MTRR state to KVM, has two consequences: - migration invalidates (loses) MTRR state, I'll concede that migration *already* loses MTRR state (on KVM), even before your patch. On the incoming host, the difference is that pre-patch, the guest continues running (after migration) with MTRRs in the initial KVM state, while post-patch, the guest continues running after an explicit zeroing of the variable MTRR masks and the deftype. I admit that it wouldn't be right to say that the patch causes MTRR state loss. With that, I think I've actually convinced myself that your patch is correct: The x86_cpu_reset() hunk is correct in any case, independently of KVM vs. TCG. (On TCG it even improves MTRR conformance.) Splitting that hunk into a separate patch might be worthwhile, but not overly important. The kvm_put_msrs() hunk forces a zero write to the variable MTRR PhysMasks and the DefType, on both reset and on incoming migration. For reset, this is correct behavior. For incoming migration, it is not, but it certainly shouldn't qualify as a regression, relative to the current status (where MTRR state is simply lost and replaced with initial MTRR state on the incoming host). I think the above end results could be expressed more clearly in the code, but I'm already wondering if you'll ever talk to me again, so I'm willing to give my R-b if you think that's useful... :) (Again, I might be wrong, easily.) Thanks Laszlo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
On 08/13/2014 12:30 AM, Christoffer Dall wrote: On Tue, Aug 12, 2014 at 06:27:11PM -0700, Mario Smarduch wrote: On 08/12/2014 02:50 AM, Christoffer Dall wrote: On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote: On 08/11/2014 12:13 PM, Christoffer Dall wrote: On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote: [...] @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data) { pte_t *pte = (pte_t *)data; -stage2_set_pte(kvm, NULL, gpa, pte, false); +stage2_set_pte(kvm, NULL, gpa, pte, false, false); why is logging never active if we are called from MMU notifiers? mmu notifiers update sptes, but I don't see how these updates can result in guest dirty pages. Also guest pages are marked dirty from 2nd stage page fault handlers (searching through the code). Ok, then add: /* * We can always call stage2_set_pte with logging_active == false, * because MMU notifiers will have unmapped a huge PMD before calling * -change_pte() (which in turn calls kvm_set_spte_hva()) and therefore * stage2_set_pte() never needs to clear out a huge PMD through this * calling path. */ So here on permission change to primary pte's kernel first invalidates related s2ptes followed by -change_pte calls to synchronize s2ptes. As consequence of invalidation we unmap huge PMDs, if a page falls in that range. Is the comment to point out use of logging flag under various scenarios? The comment is because when you look at this function it is not obvious why we pass logging_active=false, despite logging may actually be active. This could suggest that the parameter to stage2_set_pte() should be named differently (break_huge_pmds) or something like that, but we can also be satisfied with the comment. Ok I see, I was thinking you thought it was breaking something. Yeah I'll add the comment, in reality this is another use case where a PMD may need to be converted to page table so it makes sense to contrast use cases. Should I add comments on flag use in other places as well? It's always a judgement call. I didn't find it necessarry to put a comment elsewhere because I think it's pretty obivous that we would never care about logging writes to device regions. However, this made me think, are we making sure that we are not marking device mappings as read-only in the wp_range functions? I think it's KVM_SET_USER_MEMORY_REGION ioctl doesn't check type of region being installed/operated on (KVM_MEM_LOG_DIRTY_PAGES), in case of QEMU these regions wind up in KVMState-KVMSlot[], when memory_region_add_subregion() is called KVM listener installs it. For migration and dirty page logging QEMU walks the KVMSlot[] array. For QEMU VFIO (PCI) mmap()ing BAR of type IORESOURCE_MEM, causes the memory region to be added to KVMState-KVMSlot[]. In that case it's possible to walk KVMState-KVMSlot[] issue the ioctl and come across a device mapping with normal memory and WP it's s2ptes (VFIO sets unmigrateble state though). But I'm not sure what's there to stop someone calling the ioctl and install a region with device memory type. Most likely though if you installed that kind of region migration would be disabled. But just for logging use not checking memory type could be an issue. quite bad if we mark the VCPU interface as read-only for example. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On 08/13/2014 11:44 AM, H. Peter Anvin wrote: On 08/13/2014 11:33 AM, Andy Lutomirski wrote: As for doing arch_random_init after clone/migration, I think we'll need another KVM extension for that, since, AFAIK, we don't actually get notified that we were cloned or migrated. That will be nontrivial. Maybe we can figure that out at KS, too. We don't need a reset when migrated (although it might be a good idea under some circumstances, i.e. if the pools might somehow have gotten exposed) but definitely when cloned. But yes, we need a notification. For obvious reasons there is no suspend event (one can snapshot a running VM) but we need to be notified upon wakeup, *or* we need to give KVM a way to update the necessary state. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The status about vhost-net on kvm-arm?
On 2014/8/13 19:25, Nikolay Nikolaev wrote: On Wed, Aug 13, 2014 at 12:10 PM, Nikolay Nikolaev n.nikol...@virtualopensystems.com wrote: On Tue, Aug 12, 2014 at 6:47 PM, Nikolay Nikolaev n.nikol...@virtualopensystems.com wrote: Hello, On Tue, Aug 12, 2014 at 5:41 AM, Li Liu john.li...@huawei.com wrote: Hi all, Is anyone there can tell the current status of vhost-net on kvm-arm? Half a year has passed from Isa Ansharullah asked this question: http://www.spinics.net/lists/kvm-arm/msg08152.html I have found two patches which have provided the kvm-arm support of eventfd and irqfd: 1) [RFC PATCH 0/4] ARM: KVM: Enable the ioeventfd capability of KVM on ARM http://lists.gnu.org/archive/html/qemu-devel/2014-01/msg01770.html 2) [RFC,v3] ARM: KVM: add irqfd and irq routing support https://patches.linaro.org/32261/ And there's a rough patch for qemu to support eventfd from Ying-Shiuan Pan: [Qemu-devel] [PATCH 0/4] ioeventfd support for virtio-mmio https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html But there no any comments of this patch. And I can found nothing about qemu to support irqfd. Do I lost the track? If nobody try to fix it. We have a plan to complete it about virtio-mmio supporing irqfd and multiqueue. we at Virtual Open Systems did some work and tested vhost-net on ARM back in March. The setup was based on: - host kernel with our ioeventfd patches: http://www.spinics.net/lists/kvm-arm/msg08413.html - qemu with the aforementioned patches from Ying-Shiuan Pan https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html The testbed was ARM Chromebook with Exynos 5250, using a 1Gbps USB3 Ethernet adapter connected to a 1Gbps switch. I can't find the actual numbers but I remember that with multiple streams the gain was clearly seen. Note that it used the minimum required ioventfd implementation and not irqfd. I guess it is feasible to think that it all can be put together and rebased + the recent irqfd work. One can achiev even better performance (because of the irqfd). Managed to replicate the setup with the old versions e used in March: Single stream from another machine to chromebook with 1Gbps USB3 Ethernet adapter. iperf -c address -P 1 -i 1 -p 5001 -f k -t 10 to HOST: 858316 Kbits/sec to GUEST: 761563 Kbits/sec to GUEST vhost=off: 508150 Kbits/sec 10 parallel streams iperf -c address -P 10 -i 1 -p 5001 -f k -t 10 to HOST: 842420 Kbits/sec to GUEST: 625144 Kbits/sec to GUEST vhost=off: 425276 Kbits/sec I have tested the same cases on a Hisilicon board (Cortex-A15@1G) with Integrated 1Gbps Ethernet adapter. iperf -c address -P 1 -i 1 -p 5001 -f M -t 10 to HOST: 906 Mbits/sec to GUEST: 562 Mbits/sec to GUEST vhost=off: 340 Mbits/sec 10 parallel streams, the performance gets 10% plus: iperf -c address -P 10 -i 1 -p 5001 -f M -t 10 to HOST: 923 Mbits/sec to GUEST: 592 Mbits/sec to GUEST vhost=off: 364 Mbits/sec I't easy to see vhost-net brings great performance improvements, almost 50%+. Li. ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm regards, Nikolay Nikolaev Virtual Open Systems . -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] PC, KVM, CMA: Fix regression caused by wrong get_order() use
fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no functional change but this is not true as it calls get_order() (which takes bytes) where it should have called ilog2() and the kernel stops on VM_BUG_ON(). This replaces get_order() with order_base_2() (round-up version of ilog2). Suggested-by: Paul Mackerras pau...@samba.org Cc: Alexander Graf ag...@suse.de Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Cc: Joonsoo Kim iamjoonsoo@lge.com Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: v2: * s/ilog2/order_base_2/ * removed cc: sta...@vger.kernel.org as I got wrong impression that v3.16 is broken --- arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 329d7fd..b9615ba 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma() ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL); if (!ri) return NULL; - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages)); + page = cma_alloc(kvm_cma, kvm_rma_pages, order_base_2(kvm_rma_pages)); if (!page) goto err_out; atomic_set(ri-use_count, 1); @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages) { unsigned long align_pages = HPT_ALIGN_PAGES; - VM_BUG_ON(get_order(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); + VM_BUG_ON(order_base_2(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); /* Old CPUs require HPT aligned on a multiple of its size */ if (!cpu_has_feature(CPU_FTR_ARCH_206)) align_pages = nr_pages; - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages)); + return cma_alloc(kvm_cma, nr_pages, order_base_2(align_pages)); } EXPORT_SYMBOL_GPL(kvm_alloc_hpt); -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] PC, KVM, CMA: Fix regression caused by wrong get_order() use
Alexey Kardashevskiy a...@ozlabs.ru writes: fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no functional change but this is not true as it calls get_order() (which takes bytes) where it should have called ilog2() and the kernel stops on VM_BUG_ON(). This replaces get_order() with order_base_2() (round-up version of ilog2). Suggested-by: Paul Mackerras pau...@samba.org Cc: Alexander Graf ag...@suse.de Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Cc: Joonsoo Kim iamjoonsoo@lge.com Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- Changes: v2: * s/ilog2/order_base_2/ * removed cc: sta...@vger.kernel.org as I got wrong impression that v3.16 is broken --- arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 329d7fd..b9615ba 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma() ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL); if (!ri) return NULL; - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages)); + page = cma_alloc(kvm_cma, kvm_rma_pages, order_base_2(kvm_rma_pages)); if (!page) goto err_out; atomic_set(ri-use_count, 1); @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages) { unsigned long align_pages = HPT_ALIGN_PAGES; - VM_BUG_ON(get_order(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); + VM_BUG_ON(order_base_2(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); /* Old CPUs require HPT aligned on a multiple of its size */ if (!cpu_has_feature(CPU_FTR_ARCH_206)) align_pages = nr_pages; - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages)); + return cma_alloc(kvm_cma, nr_pages, order_base_2(align_pages)); } EXPORT_SYMBOL_GPL(kvm_alloc_hpt); -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On Wed, Aug 13, 2014 at 7:41 PM, H. Peter Anvin h...@zytor.com wrote: On 08/13/2014 11:44 AM, H. Peter Anvin wrote: On 08/13/2014 11:33 AM, Andy Lutomirski wrote: As for doing arch_random_init after clone/migration, I think we'll need another KVM extension for that, since, AFAIK, we don't actually get notified that we were cloned or migrated. That will be nontrivial. Maybe we can figure that out at KS, too. We don't need a reset when migrated (although it might be a good idea under some circumstances, i.e. if the pools might somehow have gotten exposed) but definitely when cloned. But yes, we need a notification. For obvious reasons there is no suspend event (one can snapshot a running VM) but we need to be notified upon wakeup, *or* we need to give KVM a way to update the necessary state. This could presumably use the interrupt mechanism on virtio-rng if we're willing to depend on having host support for virtio-rng. v6 (coming in a few minutes) will at least get it right when the kernel goes through the resume path (i.e. not KVM/QEMU suspend, and maybe not S0ix either). --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
On 08/13/2014 05:18 AM, David Matlack wrote: On Mon, Aug 11, 2014 at 10:02 PM, Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: @@ -722,9 +719,10 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm, { struct kvm_memslots *old_memslots = kvm-memslots; I think you want slots-generation = old_memslots-generation; here. On the KVM_MR_DELETE path, install_new_memslots is called twice so this patch introduces a short window of time where the generation number actually decreases. Yes, indeed. Thank you for pointing it out, will update. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 0/7] random,x86,kvm: Rework arch RNG seeds and get some from kvm
This introduces and uses a very simple synchronous mechanism to get /dev/urandom-style bits appropriate for initial KVM PV guest RNG seeding. It also re-works the way that architectural random data is fed into random.c's pools. Timekeeping randomness now comes directly from the timekeeping core rather than being pulled in from init_std_data, and timekeeping randomness is added both on boot and on resume. I added a new arch hook called arch_rng_init. The default implementation is more or less the same as the current code, except that random_get_entropy is now called unconditionally. We now also call init_std_data on resume. x86 gets a custom arch_rng_init. It will use KVM_GET_RNG_SEED if available, and, if it does anything, it will log the number of bits collected from each available architectural source. If more paravirt seed sources show up, it will be a natural place to add them. I sent the corresponding kvm-unit-tests and qemu changes separately. Changes from v5: - Moved the generic changes to the beginning. - Renamed arch_get_rng_seed to arch_rng_init. - The timekeeping change is new. - random.c registers a syscore callback to reseed on resume. Changes from v4: - Got rid of the RDRAND behavior change. If this series is accepted, I may resend it separately, but I think it's an unrelated issue. - Fix up the changelog entries -- I misunderstood how the old code worked. - Avoid lots of failed attempts to use KVM_GET_RNG_SEED if it's not available. Changes from v3: - Other than KASLR, the guest pieces are completely rewritten. Patches 2-4 have essentially nothing in common with v2. Changes from v2: - Bisection fix (patch 2 had a misplaced brace). The final states is identical to that of v2. - Improve the 0/5 description a little bit. Changes from v1: - Split patches 2 and 3 - Log all arch sources in init_std_data - Fix the 32-bit kaslr build Andy Lutomirski (7): random: Add and use arch_rng_init random, timekeeping: Collect timekeeping entropy in the timekeeping code random: Reseed pools on resume x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit x86,random: Add an x86 implementation of arch_rng_init x86,random,kvm: Use KVM_GET_RNG_SEED in arch_rng_init x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available Documentation/virtual/kvm/cpuid.txt | 3 ++ arch/x86/Kconfig | 4 ++ arch/x86/boot/compressed/aslr.c | 27 + arch/x86/include/asm/archrandom.h| 6 +++ arch/x86/include/asm/kvm_guest.h | 9 + arch/x86/include/asm/processor.h | 21 -- arch/x86/include/uapi/asm/kvm_para.h | 2 + arch/x86/kernel/Makefile | 2 + arch/x86/kernel/archrandom.c | 74 arch/x86/kernel/kvm.c| 10 + arch/x86/kvm/cpuid.c | 3 +- arch/x86/kvm/x86.c | 4 ++ drivers/char/random.c| 42 include/linux/random.h | 40 +++ kernel/time/timekeeping.c| 11 ++ 15 files changed, 246 insertions(+), 12 deletions(-) create mode 100644 arch/x86/kernel/archrandom.c -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 7/7] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available
It's considerably better than any of the alternatives on KVM. Rather than reinventing all of the cpu feature query code, this fixes native_cpuid to work in PIC objects. I haven't combined it with boot/cpuflags.c's cpuid implementation: including asm/processor.h from boot/cpuflags.c results in a flood of unrelated errors, and fixing it might be messy. Reviewed-by: Kees Cook keesc...@chromium.org Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/boot/compressed/aslr.c | 27 +++ arch/x86/include/asm/processor.h | 21 ++--- 2 files changed, 45 insertions(+), 3 deletions(-) diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c index fc6091a..8583f0e 100644 --- a/arch/x86/boot/compressed/aslr.c +++ b/arch/x86/boot/compressed/aslr.c @@ -5,6 +5,8 @@ #include asm/archrandom.h #include asm/e820.h +#include uapi/asm/kvm_para.h + #include generated/compile.h #include linux/module.h #include linux/uts.h @@ -15,6 +17,22 @@ static const char build_str[] = UTS_RELEASE ( LINUX_COMPILE_BY @ LINUX_COMPILE_HOST ) ( LINUX_COMPILER ) UTS_VERSION; +static bool kvm_para_has_feature(unsigned int feature) +{ + u32 kvm_base; + u32 features; + + if (!has_cpuflag(X86_FEATURE_HYPERVISOR)) + return false; + + kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES); + if (!kvm_base) + return false; + + features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES); + return features (1UL feature); +} + #define I8254_PORT_CONTROL 0x43 #define I8254_PORT_COUNTER00x40 #define I8254_CMD_READBACK 0xC0 @@ -81,6 +99,15 @@ static unsigned long get_random_long(void) } } + if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) { + u64 seed; + + debug_putstr( MSR_KVM_GET_RNG_SEED); + rdmsrl(MSR_KVM_GET_RNG_SEED, seed); + random ^= (unsigned long)seed; + use_i8254 = false; + } + if (has_cpuflag(X86_FEATURE_TSC)) { debug_putstr( RDTSC); rdtscll(raw); diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a4ea023..6096f3c 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -189,10 +189,25 @@ static inline int have_cpuid_p(void) static inline void native_cpuid(unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) { - /* ecx is often an input as well as an output. */ - asm volatile(cpuid + /* +* This function can be used from the boot code, so it needs +* to avoid using EBX in constraints in PIC mode. +* +* ecx is often an input as well as an output. +*/ + asm volatile(.ifnc %%ebx,%1 ; .ifnc %%rbx,%1 \n\t +movl %%ebx,%1\n\t +.endif ; .endif \n\t +cpuid \n\t +.ifnc %%ebx,%1 ; .ifnc %%rbx,%1 \n\t +xchgl %%ebx,%1\n\t +.endif ; .endif : =a (*eax), - =b (*ebx), +#if defined(__i386__) defined(__PIC__) + =r (*ebx), /* gcc won't let us use ebx */ +#else + =b (*ebx), /* ebx is okay */ +#endif =c (*ecx), =d (*edx) : 0 (*eax), 2 (*ecx) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 6/7] x86,random,kvm: Use KVM_GET_RNG_SEED in arch_rng_init
This is a straightforward implementation: for each bit of internal RNG state, request one bit from KVM_GET_RNG_SEED. This is done even if RDSEED/RDRAND worked, since KVM_GET_RNG_SEED is likely to provide cryptographically secure output even if the CPU's RNG is weak or compromised. Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/Kconfig | 4 arch/x86/include/asm/kvm_guest.h | 9 + arch/x86/kernel/archrandom.c | 25 - arch/x86/kernel/kvm.c| 10 ++ 4 files changed, 47 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d24887b..ad87278 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -594,6 +594,7 @@ config KVM_GUEST bool KVM Guest support (including kvmclock) depends on PARAVIRT select PARAVIRT_CLOCK + select ARCH_RANDOM default y ---help--- This option enables various optimizations for running under the KVM @@ -1508,6 +1509,9 @@ config ARCH_RANDOM If supported, this is a high bandwidth, cryptographically secure hardware random number generator. + This also enables paravirt RNGs such as KVM's if the relevant + PV guest support is enabled. + config X86_SMAP def_bool y prompt Supervisor Mode Access Prevention if EXPERT diff --git a/arch/x86/include/asm/kvm_guest.h b/arch/x86/include/asm/kvm_guest.h index a92b176..8c4dbd5 100644 --- a/arch/x86/include/asm/kvm_guest.h +++ b/arch/x86/include/asm/kvm_guest.h @@ -3,4 +3,13 @@ int kvm_setup_vsyscall_timeinfo(void); +#if defined(CONFIG_KVM_GUEST) defined(CONFIG_ARCH_RANDOM) +extern bool kvm_get_rng_seed(u64 *rv); +#else +static inline bool kvm_get_rng_seed(u64 *rv) +{ + return false; +} +#endif + #endif /* _ASM_X86_KVM_GUEST_H */ diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c index e8d2ffb..adbaa25 100644 --- a/arch/x86/kernel/archrandom.c +++ b/arch/x86/kernel/archrandom.c @@ -15,6 +15,7 @@ */ #include asm/archrandom.h +#include asm/kvm_guest.h void arch_rng_init(void *ctx, void (*seed)(void *ctx, u32 data), @@ -22,7 +23,7 @@ void arch_rng_init(void *ctx, const char *log_prefix) { int i; - int rdseed_bits = 0, rdrand_bits = 0; + int rdseed_bits = 0, rdrand_bits = 0, kvm_bits = 0; char buf[128] = ; char *msgptr = buf; @@ -42,10 +43,32 @@ void arch_rng_init(void *ctx, #endif } + /* +* Use KVM_GET_RNG_SEED regardless of whether the CPU RNG +* worked, since it incorporates entropy unavailable to the CPU, +* and we shouldn't trust the hardware RNG more than we need to. +* We request enough bits for the entire internal RNG state, +* because there's no good reason not to. +*/ + for (i = 0; i bits_per_source; i += 64) { + u64 rv; + + if (kvm_get_rng_seed(rv)) { + seed(ctx, (u32)rv); + seed(ctx, (u32)(rv 32)); + kvm_bits += 8 * sizeof(rv); + } else { + break; /* If it fails once, it will keep failing. */ + } + } + if (rdseed_bits) msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits); if (rdrand_bits) msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits); + if (kvm_bits) + msgptr += sprintf(msgptr, , %d bits from KVM_GET_RNG_BITS, + kvm_bits); if (buf[0]) pr_info(%s with %s\n, log_prefix, buf + 2); } diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 3dd8e2c..bd8783a 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -416,6 +416,16 @@ void kvm_disable_steal_time(void) wrmsr(MSR_KVM_STEAL_TIME, 0, 0); } +bool kvm_get_rng_seed(u64 *v) +{ + /* +* Allow migration from a hypervisor with the GET_RNG_SEED +* feature to a hypervisor without it. +*/ + return (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED) + rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0); +} + #ifdef CONFIG_SMP static void __init kvm_smp_prepare_boot_cpu(void) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 6/7] x86,random,kvm: Use KVM_GET_RNG_SEED in arch_get_rng_seed
This is a straightforward implementation: for each bit of internal RNG state, request one bit from KVM_GET_RNG_SEED. This is done even if RDSEED/RDRAND worked, since KVM_GET_RNG_SEED is likely to provide cryptographically secure output even if the CPU's RNG is weak or compromised. Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/Kconfig | 4 arch/x86/include/asm/kvm_guest.h | 9 + arch/x86/kernel/archrandom.c | 25 - arch/x86/kernel/kvm.c| 10 ++ 4 files changed, 47 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d24887b..ad87278 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -594,6 +594,7 @@ config KVM_GUEST bool KVM Guest support (including kvmclock) depends on PARAVIRT select PARAVIRT_CLOCK + select ARCH_RANDOM default y ---help--- This option enables various optimizations for running under the KVM @@ -1508,6 +1509,9 @@ config ARCH_RANDOM If supported, this is a high bandwidth, cryptographically secure hardware random number generator. + This also enables paravirt RNGs such as KVM's if the relevant + PV guest support is enabled. + config X86_SMAP def_bool y prompt Supervisor Mode Access Prevention if EXPERT diff --git a/arch/x86/include/asm/kvm_guest.h b/arch/x86/include/asm/kvm_guest.h index a92b176..8c4dbd5 100644 --- a/arch/x86/include/asm/kvm_guest.h +++ b/arch/x86/include/asm/kvm_guest.h @@ -3,4 +3,13 @@ int kvm_setup_vsyscall_timeinfo(void); +#if defined(CONFIG_KVM_GUEST) defined(CONFIG_ARCH_RANDOM) +extern bool kvm_get_rng_seed(u64 *rv); +#else +static inline bool kvm_get_rng_seed(u64 *rv) +{ + return false; +} +#endif + #endif /* _ASM_X86_KVM_GUEST_H */ diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c index e8d2ffb..adbaa25 100644 --- a/arch/x86/kernel/archrandom.c +++ b/arch/x86/kernel/archrandom.c @@ -15,6 +15,7 @@ */ #include asm/archrandom.h +#include asm/kvm_guest.h void arch_rng_init(void *ctx, void (*seed)(void *ctx, u32 data), @@ -22,7 +23,7 @@ void arch_rng_init(void *ctx, const char *log_prefix) { int i; - int rdseed_bits = 0, rdrand_bits = 0; + int rdseed_bits = 0, rdrand_bits = 0, kvm_bits = 0; char buf[128] = ; char *msgptr = buf; @@ -42,10 +43,32 @@ void arch_rng_init(void *ctx, #endif } + /* +* Use KVM_GET_RNG_SEED regardless of whether the CPU RNG +* worked, since it incorporates entropy unavailable to the CPU, +* and we shouldn't trust the hardware RNG more than we need to. +* We request enough bits for the entire internal RNG state, +* because there's no good reason not to. +*/ + for (i = 0; i bits_per_source; i += 64) { + u64 rv; + + if (kvm_get_rng_seed(rv)) { + seed(ctx, (u32)rv); + seed(ctx, (u32)(rv 32)); + kvm_bits += 8 * sizeof(rv); + } else { + break; /* If it fails once, it will keep failing. */ + } + } + if (rdseed_bits) msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits); if (rdrand_bits) msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits); + if (kvm_bits) + msgptr += sprintf(msgptr, , %d bits from KVM_GET_RNG_BITS, + kvm_bits); if (buf[0]) pr_info(%s with %s\n, log_prefix, buf + 2); } diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 3dd8e2c..bd8783a 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -416,6 +416,16 @@ void kvm_disable_steal_time(void) wrmsr(MSR_KVM_STEAL_TIME, 0, 0); } +bool kvm_get_rng_seed(u64 *v) +{ + /* +* Allow migration from a hypervisor with the GET_RNG_SEED +* feature to a hypervisor without it. +*/ + return (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED) + rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0); +} + #ifdef CONFIG_SMP static void __init kvm_smp_prepare_boot_cpu(void) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 5/7] x86,random: Add an x86 implementation of arch_rng_init
This does the same thing as the generic implementation, except that it logs how many bits of each type it collected. I want to know whether the initial seeding is working and, if so, whether the RNG is fast enough. (I know that hpa assures me that the hardware RNG is more than fast enough, but I'd still like a direct way to verify this.) Arguably, arch_get_random_seed could be removed now: I'm having some trouble imagining a sensible non-architecture-specific use of it that wouldn't be better served by arch_rng_init. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/include/asm/archrandom.h | 6 + arch/x86/kernel/Makefile | 2 ++ arch/x86/kernel/archrandom.c | 51 +++ 3 files changed, 59 insertions(+) create mode 100644 arch/x86/kernel/archrandom.c diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h index 69f1366..5611c21 100644 --- a/arch/x86/include/asm/archrandom.h +++ b/arch/x86/include/asm/archrandom.h @@ -117,6 +117,12 @@ GET_SEED(arch_get_random_seed_int, unsigned int, RDSEED_INT, ASM_NOP4); #define arch_has_random() static_cpu_has(X86_FEATURE_RDRAND) #define arch_has_random_seed() static_cpu_has(X86_FEATURE_RDSEED) +#define __HAVE_ARCH_RNG_INIT +extern void arch_rng_init(void *ctx, + void (*seed)(void *ctx, u32 data), + int bits_per_source, + const char *log_prefix); + #else static inline int rdrand_long(unsigned long *v) diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 047f9ff..0718bae 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -92,6 +92,8 @@ obj-$(CONFIG_PARAVIRT)+= paravirt.o paravirt_patch_$(BITS).o obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o +obj-$(CONFIG_ARCH_RANDOM) += archrandom.o + obj-$(CONFIG_PCSPKR_PLATFORM) += pcspeaker.o obj-$(CONFIG_X86_CHECK_BIOS_CORRUPTION) += check.o diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c new file mode 100644 index 000..e8d2ffb --- /dev/null +++ b/arch/x86/kernel/archrandom.c @@ -0,0 +1,51 @@ +/* + * This file is part of the Linux kernel. + * + * Copyright (c) 2014 Andy Lutomirski + * Authors: Andy Lutomirski l...@amacapital.net + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#include asm/archrandom.h + +void arch_rng_init(void *ctx, + void (*seed)(void *ctx, u32 data), + int bits_per_source, + const char *log_prefix) +{ + int i; + int rdseed_bits = 0, rdrand_bits = 0; + char buf[128] = ; + char *msgptr = buf; + + for (i = 0; i bits_per_source; i += 8 * sizeof(long)) { + unsigned long rv; + + if (arch_get_random_seed_long(rv)) + rdseed_bits += 8 * sizeof(rv); + else if (arch_get_random_long(rv)) + rdrand_bits += 8 * sizeof(rv); + else + continue; /* Don't waste time mixing. */ + + seed(ctx, (u32)rv); +#if BITS_PER_LONG 32 + seed(ctx, (u32)(rv 32)); +#endif + } + + if (rdseed_bits) + msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits); + if (rdrand_bits) + msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits); + if (buf[0]) + pr_info(%s with %s\n, log_prefix, buf + 2); +} -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 3/7] random: Reseed pools on resume
After a suspend/resume cycle, and especially after hibernating, we should assume that the random pools might have leaked. To minimize the risk this poses, try to collect fresh architectural entropy on resume. Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 26 +++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 8dc3e3a..0811ad4 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -257,6 +257,7 @@ #include linux/kmemcheck.h #include linux/workqueue.h #include linux/irq.h +#include linux/syscore_ops.h #include asm/processor.h #include asm/uaccess.h @@ -1279,6 +1280,26 @@ static void init_std_data(struct entropy_store *r) mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL); } +static void init_all_pools(void) +{ + init_std_data(input_pool); + init_std_data(blocking_pool); + init_std_data(nonblocking_pool); +} + +static void random_resume(void) +{ + /* +* After resume (and especially after hibernation / kexec resume), +* make a best-effort attempt to collect fresh entropy. +*/ + init_all_pools(); +} + +static struct syscore_ops random_syscore_ops = { + .resume = random_resume, +}; + /* * Note that setup_arch() may call add_device_randomness() * long before we get here. This allows seeding of the pools @@ -1291,9 +1312,8 @@ static void init_std_data(struct entropy_store *r) */ static int rand_initialize(void) { - init_std_data(input_pool); - init_std_data(blocking_pool); - init_std_data(nonblocking_pool); + init_all_pools(); + register_syscore_ops(random_syscore_ops); return 0; } early_initcall(rand_initialize); -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 4/7] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
This adds a simple interface to allow a guest to request 64 bits of host nonblocking entropy. This is independent of virtio-rng for a couple of reasons: - It's intended to be usable during early boot, when a trivial synchronous interface is needed. - virtio-rng gives blocking entropy, and making guest boot wait for the host's /dev/random will cause problems. MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort cryptographically secure data for use as a seed. It provides no guarantee that the result contains any actual entropy. Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Andy Lutomirski l...@amacapital.net --- Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 4 files changed, 11 insertions(+), 1 deletion(-) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 3c65feb..0ab043b 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit || || before enabling paravirtualized || || spinlock support. -- +KVM_FEATURE_GET_RNG_SEED || 8 || host provides rng seed data via + || || MSR_KVM_GET_RNG_SEED. +-- KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 94dc8ca..e2eaf93 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -24,6 +24,7 @@ #define KVM_FEATURE_STEAL_TIME 5 #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 +#define KVM_FEATURE_GET_RNG_SEED 8 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -40,6 +41,7 @@ #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 #define MSR_KVM_PV_EOI_EN 0x4b564d04 +#define MSR_KVM_GET_RNG_SEED 0x4b564d05 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 38a0afe..40d6763 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_PV_EOI) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | -(1 KVM_FEATURE_PV_UNHALT); +(1 KVM_FEATURE_PV_UNHALT) | +(1 KVM_FEATURE_GET_RNG_SEED); if (sched_info_on()) entry-eax |= (1 KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ef432f8..695b682 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -48,6 +48,7 @@ #include linux/pci.h #include linux/timekeeper_internal.h #include linux/pvclock_gtod.h +#include linux/random.h #include trace/events/kvm.h #define CREATE_TRACE_POINTS @@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_KVM_PV_EOI_EN: data = vcpu-arch.pv_eoi.msr_val; break; + case MSR_KVM_GET_RNG_SEED: + get_random_bytes(data, sizeof(data)); + break; case MSR_IA32_P5_MC_ADDR: case MSR_IA32_P5_MC_TYPE: case MSR_IA32_MCG_CAP: -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 2/7] random, timekeeping: Collect timekeeping entropy in the timekeeping code
Currently, init_std_data calls ktime_get_real(). This imposes awkward constraints on when init_std_data can be called, and init_std_data is unlikely to collect the full unpredictable data available to the timekeeping code, especially after resume. Remove this code from random.c and add the appropriate add_device_randomness calls to timekeeping.c instead. Cc: John Stultz john.stu...@linaro.org Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 2 -- kernel/time/timekeeping.c | 11 +++ 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 7673e60..8dc3e3a 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1263,12 +1263,10 @@ static void seed_entropy_store(void *ctx, u32 data) static void init_std_data(struct entropy_store *r) { int i; - ktime_t now = ktime_get_real(); unsigned long rv; char log_prefix[128]; r-last_pulled = jiffies; - mix_pool_bytes(r, now, sizeof(now), NULL); for (i = r-poolinfo-poolbytes; i 0; i -= sizeof(rv)) { rv = random_get_entropy(); mix_pool_bytes(r, rv, sizeof(rv), NULL); diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 32d8d6a..9609db9 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -23,6 +23,7 @@ #include linux/stop_machine.h #include linux/pvclock_gtod.h #include linux/compiler.h +#include linux/random.h #include tick-internal.h #include ntp_internal.h @@ -835,6 +836,9 @@ void __init timekeeping_init(void) memcpy(shadow_timekeeper, timekeeper, sizeof(timekeeper)); write_seqcount_end(timekeeper_seq); + + add_device_randomness(tk, sizeof(tk)); + raw_spin_unlock_irqrestore(timekeeper_lock, flags); } @@ -976,6 +980,13 @@ static void timekeeping_resume(void) timekeeping_suspended = 0; timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET); write_seqcount_end(timekeeper_seq); + + /* +* The timekeeping state has a decent chance of differing +* between resumptions of the same image. +*/ + add_device_randomness(tk, sizeof(tk)); + raw_spin_unlock_irqrestore(timekeeper_lock, flags); touch_softlockup_watchdog(); -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 1/7] random: Add and use arch_rng_init
Currently, init_std_data contains its own logic for using arch random sources. This replaces that logic with a generic function arch_rng_init that allows arch code to supply its own logic. The default implementation tries arch_get_random_seed_long and arch_get_random_long individually. The only functional change here is that random_get_entropy() is used unconditionally instead of being used only when the arch sources fail. This may add a tiny amount of security. Acked-by: Theodore Ts'o ty...@mit.edu Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 14 +++--- include/linux/random.h | 40 2 files changed, 51 insertions(+), 3 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 71529e1..7673e60 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1246,6 +1246,10 @@ void get_random_bytes_arch(void *buf, int nbytes) } EXPORT_SYMBOL(get_random_bytes_arch); +static void seed_entropy_store(void *ctx, u32 data) +{ + mix_pool_bytes((struct entropy_store *)ctx, data, sizeof(data), NULL); +} /* * init_std_data - initialize pool with system data @@ -1261,15 +1265,19 @@ static void init_std_data(struct entropy_store *r) int i; ktime_t now = ktime_get_real(); unsigned long rv; + char log_prefix[128]; r-last_pulled = jiffies; mix_pool_bytes(r, now, sizeof(now), NULL); for (i = r-poolinfo-poolbytes; i 0; i -= sizeof(rv)) { - if (!arch_get_random_seed_long(rv) - !arch_get_random_long(rv)) - rv = random_get_entropy(); + rv = random_get_entropy(); mix_pool_bytes(r, rv, sizeof(rv), NULL); } + + sprintf(log_prefix, random: seeded %s pool, r-name); + arch_rng_init(r, seed_entropy_store, 8 * r-poolinfo-poolbytes, + log_prefix); + mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL); } diff --git a/include/linux/random.h b/include/linux/random.h index 57fbbff..c8d692e 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -106,6 +106,46 @@ static inline int arch_has_random_seed(void) } #endif +#ifndef __HAVE_ARCH_RNG_INIT + +/** + * arch_rng_init() - get architectural rng seed data + * @ctx: context for the seed function + * @seed: function to call for each u32 obtained + * @bits_per_source: number of bits from each source to try to use + * @log_prefix: beginning of log output (may be NULL) + * + * Synchronously load some architectural entropy or other best-effort + * random seed data. An arch-specific implementation should be no worse + * than this generic implementation. If the arch code does something + * interesting, it may log something of the form log_prefix with + * 8 bits of stuff. + * + * No arch-specific implementation should be any worse than the generic + * implementation. + */ +static inline void arch_rng_init(void *ctx, +void (*seed)(void *ctx, u32 data), +int bits_per_source, +const char *log_prefix) +{ + int i; + + for (i = 0; i bits_per_source; i += 8 * sizeof(long)) { + unsigned long rv; + + if (arch_get_random_seed_long(rv) || + arch_get_random_long(rv)) { + seed(ctx, (u32)rv); +#if BITS_PER_LONG 32 + seed(ctx, (u32)(rv 32)); +#endif + } + } +} + +#endif /* __HAVE_ARCH_RNG_INIT */ + /* Pseudo random number generator from numerical recipes. */ static inline u32 next_pseudo_random32(u32 seed) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] KVM: PPC: BOOKE: Emulate debug registers and exception
This patch emulates debug registers and debug exception to support guest using debug resource. This enables running gdb/kgdb etc in guest. On BOOKE architecture we cannot share debug resources between QEMU and guest because: When QEMU is using debug resources then debug exception must be always enabled. To achieve this we set MSR_DE and also set MSRP_DEP so guest cannot change MSR_DE. When emulating debug resource for guest we want guest to control MSR_DE (enable/disable debug interrupt on need). So above mentioned two configuration cannot be supported at the same time. So the result is that we cannot share debug resources between QEMU and Guest on BOOKE architecture. In the current design QEMU gets priority over guest, this means that if QEMU is using debug resources then guest cannot use them and if guest is using debug resource then QEMU can overwrite them. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v3-v4 - Clear only MRR on vcpu init arch/powerpc/include/asm/kvm_ppc.h | 3 + arch/powerpc/include/asm/reg_booke.h | 2 + arch/powerpc/kvm/booke.c | 42 +- arch/powerpc/kvm/booke_emulate.c | 148 +++ 4 files changed, 194 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index fb86a22..05e58b6 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -206,6 +206,9 @@ extern int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, extern int kvmppc_xics_int_on(struct kvm *kvm, u32 irq); extern int kvmppc_xics_int_off(struct kvm *kvm, u32 irq); +void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu); +void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu); + union kvmppc_one_reg { u32 wval; u64 dval; diff --git a/arch/powerpc/include/asm/reg_booke.h b/arch/powerpc/include/asm/reg_booke.h index 464f108..150d485 100644 --- a/arch/powerpc/include/asm/reg_booke.h +++ b/arch/powerpc/include/asm/reg_booke.h @@ -307,6 +307,8 @@ * DBSR bits which have conflicting definitions on true Book E versus IBM 40x. */ #ifdef CONFIG_BOOKE +#define DBSR_IDE 0x8000 /* Imprecise Debug Event */ +#define DBSR_MRR 0x3000 /* Most Recent Reset */ #define DBSR_IC0x0800 /* Instruction Completion */ #define DBSR_BT0x0400 /* Branch Taken */ #define DBSR_IRPT 0x0200 /* Exception Debug Event */ diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 074b7fc..6901862 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -267,6 +267,16 @@ static void kvmppc_core_dequeue_watchdog(struct kvm_vcpu *vcpu) clear_bit(BOOKE_IRQPRIO_WATCHDOG, vcpu-arch.pending_exceptions); } +void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu) +{ + kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DEBUG); +} + +void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu) +{ + clear_bit(BOOKE_IRQPRIO_DEBUG, vcpu-arch.pending_exceptions); +} + static void set_guest_srr(struct kvm_vcpu *vcpu, unsigned long srr0, u32 srr1) { kvmppc_set_srr0(vcpu, srr0); @@ -735,7 +745,32 @@ static int kvmppc_handle_debug(struct kvm_run *run, struct kvm_vcpu *vcpu) struct debug_reg *dbg_reg = (vcpu-arch.dbg_reg); u32 dbsr = vcpu-arch.dbsr; - /* Clear guest dbsr (vcpu-arch.dbsr) */ + if (vcpu-guest_debug == 0) { + /* +* Debug resources belong to Guest. +* Imprecise debug event is not injected +*/ + if (dbsr DBSR_IDE) { + dbsr = ~DBSR_IDE; + if (!dbsr) + return RESUME_GUEST; + } + + if (dbsr (vcpu-arch.shared-msr MSR_DE) + (vcpu-arch.dbg_reg.dbcr0 DBCR0_IDM)) + kvmppc_core_queue_debug(vcpu); + + /* Inject a program interrupt if trap debug is not allowed */ + if ((dbsr DBSR_TIE) !(vcpu-arch.shared-msr MSR_DE)) + kvmppc_core_queue_program(vcpu, ESR_PTR); + + return RESUME_GUEST; + } + + /* +* Debug resource owned by userspace. +* Clear guest dbsr (vcpu-arch.dbsr) +*/ vcpu-arch.dbsr = 0; run-debug.arch.status = 0; run-debug.arch.address = vcpu-arch.pc; @@ -1249,6 +1284,11 @@ int kvmppc_subarch_vcpu_init(struct kvm_vcpu *vcpu) setup_timer(vcpu-arch.wdt_timer, kvmppc_watchdog_func, (unsigned long)vcpu); + /* +* Clear DBSR.MRR to avoid guest debug interrupt as +* this is of host interest +*/ + mtspr(SPRN_DBSR, DBSR_MRR); return 0; } diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c index
[PATCH] KVM: PPc: BOOKE: Add one_reg documentation of SPRG9 and DBSR
This was missed in respective one_reg implementation patch. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- Documentation/virtual/kvm/api.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index a21ff22..9177f23 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1878,6 +1878,8 @@ registers, find a list below: PPC | KVM_REG_PPC_ARCH_COMPAT 32 PPC | KVM_REG_PPC_DABRX | 32 PPC | KVM_REG_PPC_WORT | 64 + PPC | KVM_REG_PPC_SPRG9 | 64 + PPC | KVM_REG_PPC_DBSR | 32 PPC | KVM_REG_PPC_TM_GPR0 | 64 ... PPC | KVM_REG_PPC_TM_GPR31 | 64 -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] PC, KVM, CMA: Fix regression caused by wrong get_order() use
Alexey Kardashevskiy a...@ozlabs.ru writes: fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no functional change but this is not true as it calls get_order() (which takes bytes) where it should have called ilog2() and the kernel stops on VM_BUG_ON(). This replaces get_order() with ilog2(). Should we round it up ? ie, ilog2(kvm_rma_pages - 1) + 1 ? Suggested-by: Paul Mackerras pau...@samba.org Cc: Alexander Graf ag...@suse.de Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Cc: Joonsoo Kim iamjoonsoo@lge.com Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: sta...@vger.kernel.org Why stable ? We merged it this merge window. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 329d7fd..bfe9f01 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma() ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL); if (!ri) return NULL; - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages)); + page = cma_alloc(kvm_cma, kvm_rma_pages, ilog2(kvm_rma_pages)); if (!page) goto err_out; atomic_set(ri-use_count, 1); @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages) { unsigned long align_pages = HPT_ALIGN_PAGES; - VM_BUG_ON(get_order(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); + VM_BUG_ON(ilog2(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); /* Old CPUs require HPT aligned on a multiple of its size */ if (!cpu_has_feature(CPU_FTR_ARCH_206)) align_pages = nr_pages; - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages)); + return cma_alloc(kvm_cma, nr_pages, ilog2(align_pages)); } EXPORT_SYMBOL_GPL(kvm_alloc_hpt); -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] PC, KVM, CMA: Fix regression caused by wrong get_order() use
fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no functional change but this is not true as it calls get_order() (which takes bytes) where it should have called ilog2() and the kernel stops on VM_BUG_ON(). This replaces get_order() with order_base_2() (round-up version of ilog2). Suggested-by: Paul Mackerras pau...@samba.org Cc: Alexander Graf ag...@suse.de Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Cc: Joonsoo Kim iamjoonsoo@lge.com Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: v2: * s/ilog2/order_base_2/ * removed cc: sta...@vger.kernel.org as I got wrong impression that v3.16 is broken --- arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 329d7fd..b9615ba 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma() ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL); if (!ri) return NULL; - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages)); + page = cma_alloc(kvm_cma, kvm_rma_pages, order_base_2(kvm_rma_pages)); if (!page) goto err_out; atomic_set(ri-use_count, 1); @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages) { unsigned long align_pages = HPT_ALIGN_PAGES; - VM_BUG_ON(get_order(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); + VM_BUG_ON(order_base_2(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); /* Old CPUs require HPT aligned on a multiple of its size */ if (!cpu_has_feature(CPU_FTR_ARCH_206)) align_pages = nr_pages; - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages)); + return cma_alloc(kvm_cma, nr_pages, order_base_2(align_pages)); } EXPORT_SYMBOL_GPL(kvm_alloc_hpt); -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] PC, KVM, CMA: Fix regression caused by wrong get_order() use
Alexey Kardashevskiy a...@ozlabs.ru writes: fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no functional change but this is not true as it calls get_order() (which takes bytes) where it should have called ilog2() and the kernel stops on VM_BUG_ON(). This replaces get_order() with order_base_2() (round-up version of ilog2). Suggested-by: Paul Mackerras pau...@samba.org Cc: Alexander Graf ag...@suse.de Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Cc: Joonsoo Kim iamjoonsoo@lge.com Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- Changes: v2: * s/ilog2/order_base_2/ * removed cc: sta...@vger.kernel.org as I got wrong impression that v3.16 is broken --- arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 329d7fd..b9615ba 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma() ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL); if (!ri) return NULL; - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages)); + page = cma_alloc(kvm_cma, kvm_rma_pages, order_base_2(kvm_rma_pages)); if (!page) goto err_out; atomic_set(ri-use_count, 1); @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages) { unsigned long align_pages = HPT_ALIGN_PAGES; - VM_BUG_ON(get_order(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); + VM_BUG_ON(order_base_2(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); /* Old CPUs require HPT aligned on a multiple of its size */ if (!cpu_has_feature(CPU_FTR_ARCH_206)) align_pages = nr_pages; - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages)); + return cma_alloc(kvm_cma, nr_pages, order_base_2(align_pages)); } EXPORT_SYMBOL_GPL(kvm_alloc_hpt); -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html