Re: kvm ivy bridge support?
virsh cpu-models x86_64 | sort 486 Conroe Haswell Nehalem Opteron_G1 Opteron_G2 Opteron_G3 Opteron_G4 Opteron_G5 Penryn SandyBridge Westmere athlon core2duo coreduo cpu64-rhel5 cpu64-rhel6 kvm32 kvm64 n270 pentium pentium2 pentium3 pentiumpro phenom qemu32 qemu64 interesting that it doesn't have Ivy Bridge, any reason? On Fri, Dec 5, 2014 at 3:29 PM, Paolo Bonzini pbonz...@redhat.com wrote: On 05/12/2014 02:33, t...@tetrioncapital.com wrote: May I know how could I check CPU support list on KVM? QEMU: qemu-kvm -cpu help|grep x86 (or /usr/libexec/qemu-kvm -cpu help|grep x86 depending on your distribution) Libvirt: virsh cpu-models x86_64|sort Paolo -- Thomas Lau Director of Infrastructure Tetrion Capital Limited Direct: +852-3976-8903 Mobile: +852-9323-9670 Address: 20/F, IFC 1, Central district, Hong Kong -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE][RFC] KVMGT - the implementation of Intel GVT-g(full GPU virtualization) for KVM
Hi, In KVMGT, we need to register an iodev only *after* BAR registers are written by guest. Oh, the guest can write the bar register at any time. Typically it happens at boot only, but it can also happen at runtime, for example on reboot. I've also seen the kernel redoing the pci mappings created by the bios, due to buggy _crs declarations in the qemu acpi tables. https://www.usenix.org/conference/atc14/technical-sessions/presentation/tian /me goes read this. A few comments on the kernel stuff (brief look so far, also compile-tested only, intel gfx on my test machine is too old). * Noticed the kernel bits don't even compile when configured as module. Everything (vgt, i915, kvm) must be compiled into the kernel. * Design approach still seems to be i915 on vgt not the other way around. Qemu/SeaBIOS bits: I've seen the host bridge changes identity from i440fx to copy-pci-ids-from-host. Guess the reason for this is that seabios uses this device to figure whenever it is running on i440fx or q35. Correct? What are the exact requirements for the device? Must it match the host exactly, to not confuse the guest intel graphics driver? Or would something more recent -- such as the q35 emulation qemu has -- be good enough to make things work (assuming we add support for the graphic-related pci config space registers there)? The patch also adds a dummy isa bridge at 0x1f. Simliar question here: What exactly is needed here? Would things work if we simply use the q35 lpc device here? more to come after I've read the paper linked above ... cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/5] ARM: on IO mem abort - route the call to KVM MMIO bus
On Sat, Nov 29, 2014 at 1:28 PM, Christoffer Dall christoffer.d...@linaro.org wrote: On Mon, Nov 24, 2014 at 11:26:51PM +0200, Nikolay Nikolaev wrote: On IO memory abort, try to handle the MMIO access thorugh the KVM registered read/write callbacks. This is done by invoking the relevant kvm_io_bus_* API. Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com --- arch/arm/kvm/mmio.c | 33 + 1 file changed, 33 insertions(+) diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c index 4cb5a93..81230da 100644 --- a/arch/arm/kvm/mmio.c +++ b/arch/arm/kvm/mmio.c @@ -162,6 +162,36 @@ static int decode_hsr(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return 0; } +/** + * kvm_handle_mmio - handle an in-kernel MMIO access + * @vcpu:pointer to the vcpu performing the access + * @run: pointer to the kvm_run structure + * @mmio:pointer to the data describing the access + * + * returns true if the MMIO access has been performed in kernel space, + * and false if it needs to be emulated in user space. + */ +static bool handle_kernel_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, + struct kvm_exit_mmio *mmio) +{ + int ret; + + if (mmio-is_write) { + ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, mmio-phys_addr, + mmio-len, mmio-data); + + } else { + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, mmio-phys_addr, + mmio-len, mmio-data); + } + if (!ret) { + kvm_prepare_mmio(run, mmio); + kvm_handle_mmio_return(vcpu, run); + } + + return !ret; +} + int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, phys_addr_t fault_ipa) { @@ -200,6 +230,9 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, if (vgic_handle_mmio(vcpu, run, mmio)) return 1; + if (handle_kernel_mmio(vcpu, run, mmio)) + return 1; + Is this stuff always synchronously handled so that the mmio is properly populated upon handle_kernel_mmio on reads? If I get it right the kvm_io_bus_ API is intended to work synchronously. Of course it probably depends on how the registered device handles the read/write call. Or maybe I misunderstand your question? Please clarify in that case. regards, Nikolay Nikolaev -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/5] KVM: ARM VGIC add kvm_io_bus_ frontend
On Sat, Nov 29, 2014 at 3:54 PM, Nikolay Nikolaev n.nikol...@virtualopensystems.com wrote: On Sat, Nov 29, 2014 at 1:29 PM, Christoffer Dall christoffer.d...@linaro.org wrote: On Mon, Nov 24, 2014 at 11:26:58PM +0200, Nikolay Nikolaev wrote: In io_mem_abort remove the call to vgic_handle_mmio. The target is to have a single MMIO handling path - that is through the kvm_io_bus_ API. Register a kvm_io_device in kvm_vgic_init on the whole vGIC MMIO region. Both read and write calls are redirected to vgic_io_dev_access where kvm_exit_mmio is composed to pass it to vm_ops.handle_mmio. Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com --- arch/arm/kvm/mmio.c|3 -- include/kvm/arm_vgic.h |3 +- virt/kvm/arm/vgic.c| 88 3 files changed, 74 insertions(+), 20 deletions(-) diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c index 81230da..1c44a2b 100644 --- a/arch/arm/kvm/mmio.c +++ b/arch/arm/kvm/mmio.c @@ -227,9 +227,6 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, if (mmio.is_write) mmio_write_buf(mmio.data, mmio.len, data); - if (vgic_handle_mmio(vcpu, run, mmio)) - return 1; - if (handle_kernel_mmio(vcpu, run, mmio)) return 1; diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index e452ef7..d9b7d2a 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -233,6 +233,7 @@ struct vgic_dist { unsigned long *irq_pending_on_cpu; struct vgic_vm_ops vm_ops; + struct kvm_io_device*io_dev; #endif }; @@ -307,8 +308,6 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num, bool level); void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg); int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu); -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, - struct kvm_exit_mmio *mmio); #define irqchip_in_kernel(k) (!!((k)-arch.vgic.in_kernel)) #define vgic_initialized(k) ((k)-arch.vgic.ready) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 1213da5..3da1115 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -31,6 +31,9 @@ #include asm/kvm_emulate.h #include asm/kvm_arm.h #include asm/kvm_mmu.h +#include asm/kvm.h + +#include iodev.h /* * How the whole thing works (courtesy of Christoffer Dall): @@ -775,28 +778,81 @@ bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run, return true; } -/** - * vgic_handle_mmio - handle an in-kernel MMIO access for the GIC emulation - * @vcpu: pointer to the vcpu performing the access - * @run: pointer to the kvm_run structure - * @mmio: pointer to the data describing the access - * - * returns true if the MMIO access has been performed in kernel space, - * and false if it needs to be emulated in user space. - * Calls the actual handling routine for the selected VGIC model. - */ -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, - struct kvm_exit_mmio *mmio) +static int vgic_io_dev_access(struct kvm_vcpu *vcpu, struct kvm_io_device *this, + gpa_t addr, int len, void *val, bool is_write) { - if (!irqchip_in_kernel(vcpu-kvm)) - return false; + struct kvm_exit_mmio mmio; + bool ret; + + mmio = (struct kvm_exit_mmio) { + .phys_addr = addr, + .len = len, + .is_write = is_write, + }; + + if (is_write) + memcpy(mmio.data, val, len); /* * This will currently call either vgic_v2_handle_mmio() or * vgic_v3_handle_mmio(), which in turn will call * vgic_handle_mmio_range() defined above. */ - return vcpu-kvm-arch.vgic.vm_ops.handle_mmio(vcpu, run, mmio); + ret = vcpu-kvm-arch.vgic.vm_ops.handle_mmio(vcpu, vcpu-run, mmio); + + if (!is_write) + memcpy(val, mmio.data, len); + + return ret ? 0 : 1; +} + +static int vgic_io_dev_read(struct kvm_vcpu *vcpu, struct kvm_io_device *this, + gpa_t addr, int len, void *val) +{ + return vgic_io_dev_access(vcpu, this, addr, len, val, false); +} + +static int vgic_io_dev_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this, +gpa_t addr, int len, const void *val) +{ + return vgic_io_dev_access(vcpu, this, addr, len, (void *)val, true); +} + +static const struct kvm_io_device_ops vgic_io_dev_ops = { + .read = vgic_io_dev_read, + .write = vgic_io_dev_write, +}; + +static int vgic_register_kvm_io_dev(struct kvm *kvm) +{ + struct kvm_io_device *dev; + int ret; + + struct vgic_dist *dist = kvm-arch.vgic; + unsigned long base = dist-vgic_dist_base; + + dev =
Re: [RFC PATCH 4/5] ARM: enable linking against eventfd and irqchip
On Sat, Nov 29, 2014 at 1:29 PM, Christoffer Dall christoffer.d...@linaro.org wrote: Inidicate KVM in the subject please. On Mon, Nov 24, 2014 at 11:27:06PM +0200, Nikolay Nikolaev wrote: This enables compilation of the eventfd feature on ARM. does this enable anything else than the ioeventfd stuff so that everything is in place to do this safely now? Inspecting the eventfd.c I see that there is the IRQFD code conditionally compiled when CONFIG_HAVE_KVM_IRQFD is defined. I think this is enabled for arm/arm64 in Eric's IRQFD patcheries, so we should be safe now. regards, Nikolay Nikolaev -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE][RFC] KVMGT - the implementation of Intel GVT-g(full GPU virtualization) for KVM
On 05/12/2014 09:50, Gerd Hoffmann wrote: A few comments on the kernel stuff (brief look so far, also compile-tested only, intel gfx on my test machine is too old). * Noticed the kernel bits don't even compile when configured as module. Everything (vgt, i915, kvm) must be compiled into the kernel. I'll add that the patch is basically impossible to review with all the XenGT bits still in. For example, the x86 emulator seems to be unnecessary for KVMGT, but I am not 100% sure. I would like a clear understanding of why/how Andrew Barnes was able to do i915 passthrough (GVT-d) without hacking the ISA bridge, and why this does not apply to GVT-g. Paolo * Design approach still seems to be i915 on vgt not the other way around. Qemu/SeaBIOS bits: I've seen the host bridge changes identity from i440fx to copy-pci-ids-from-host. Guess the reason for this is that seabios uses this device to figure whenever it is running on i440fx or q35. Correct? What are the exact requirements for the device? Must it match the host exactly, to not confuse the guest intel graphics driver? Or would something more recent -- such as the q35 emulation qemu has -- be good enough to make things work (assuming we add support for the graphic-related pci config space registers there)? The patch also adds a dummy isa bridge at 0x1f. Simliar question here: What exactly is needed here? Would things work if we simply use the q35 lpc device here? more to come after I've read the paper linked above ... cheers, Gerd ___ Intel-gfx mailing list intel-...@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/9] kvm: x86: Add kvm_x86_ops hook that enables XSAVES for guest
(I somehow managed to review one patch twice instead of this one ...) 2014-12-04 16:57+0100, Paolo Bonzini: From: Wanpeng Li wanpeng...@linux.intel.com Expose the XSAVES feature to the guest if the kvm_x86_ops say it is available. Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- Reviewed-by: Radim Krčmář rkrc...@redhat.com --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -69,6 +69,7 @@ #define SECONDARY_EXEC_PAUSE_LOOP_EXITING0x0400 #define SECONDARY_EXEC_ENABLE_INVPCID0x1000 #define SECONDARY_EXEC_SHADOW_VMCS 0x4000 +#define SECONDARY_EXEC_XSAVES0x0010 (Email exposes tab alignment nicely.) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Intel-gfx] [ANNOUNCE][RFC] KVMGT - the implementation of Intel GVT-g(full GPU virtualization) for KVM
On Fri, Dec 05, 2014 at 09:50:21AM +0100, Gerd Hoffmann wrote: https://www.usenix.org/conference/atc14/technical-sessions/presentation/tian /me goes read this. A few comments on the kernel stuff (brief look so far, also compile-tested only, intel gfx on my test machine is too old). * Noticed the kernel bits don't even compile when configured as module. Everything (vgt, i915, kvm) must be compiled into the kernel. * Design approach still seems to be i915 on vgt not the other way around. Yeah done a quick read-through of just the i915 bits too, same comment. I guess this is just the first RFC and the redesign we've discussed about already with xengt is in progress somewhere? Thanks, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[virtio-mmio]: arm-only?
Is virtio_mmio used only for arm emulation? I was looking through qemu sources and the only file that I have found to instantiate virtio-mmio proxies is arm/virt.c . Cata -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm ivy bridge support?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/05/2014 03:17 AM, Thomas Lau wrote: virsh cpu-models x86_64 | sort ... Penryn SandyBridge Westmere ... interesting that it doesn't have Ivy Bridge, any reason? The reason would be that you are running an older version. - -- All rights reversed -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBAgAGBQJUgc4KAAoJEM553pKExN6DHGoH/iRWWBCGO0oIgRuA2iEGVYhh M5iUVJfxzdK6S4sNRPRnUHW8YqQVkKQSULBna4zYPI6q/9d5llDZog8Ejk0XES23 8yembpEM5SsKW2bUzaxpd1t8g096+tX7tmX/i06VOyzSnRDMpyqBAR0cX2c/Yr44 2lFu7IoGI8E+ZM9CeOnCjtKBIsr5ZMV2S9++WKo9EpZAvhMdsxFQh5a47PSShJes 0Dib/KEmg3S45+eWRvDoqI6KwAgTXJ0MvvMkXvWT2uQ25BHu7AzekKl2jsSnKJeq 4PWgD23dDn1quUfeP7GJq5/zN+Rv2Apw4cQZYtVAmm7pfmMOaM8xV5A2R0ygyTw= =o0NK -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Stupid Xen vs KVM question
On Fri, Dec 05, 2014 at 08:29:54AM +0100, Paolo Bonzini wrote: On 05/12/2014 03:24, Konrad Rzeszutek Wilk wrote: We could do a simple thing - which is that the paravirt_enabled could have the value 1 for Xen and 2 for KVM. The assembler logic would be inverted and just check for 1. I am not going to attempt to write the assembler code :-) Wouldn't Xen HVM also want to be 2? Oddly enough it was never set! Looking at where the paravit_enabled() macro is used, on KVM it could be just set to zero. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
On Thu, Dec 04, 2014 at 02:53:26PM +0100, Paolo Bonzini wrote: On 25/11/2014 21:24, Thomas Gleixner wrote: On Tue, 25 Nov 2014, Rik van Riel wrote: On 11/25/2014 12:21 PM, Marcelo Tosatti wrote: Since lapic timer handler only wakes up a simple waitqueue, it can be executed from hardirq context. Reduces average cyclictest latency by 3us. Can this patch be merged in the KVM tree, and go upstream via Paolo? Not really as it has RT dependencies Can hrtimer_start_expires even return ETIME on a non-RT kernel? If yes, I can take patch 2. If not, it's better to keep both patches in the RT tree. Paolo It can't. We're still evaluating the necessity of this patch, will resubmit accordingly. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT
On Wed, Dec 03, 2014 at 08:39:47PM +0100, Luis R. Rodriguez wrote: On Wed, Dec 03, 2014 at 05:37:51AM +0100, Juergen Gross wrote: On 12/03/2014 03:28 AM, Luis R. Rodriguez wrote: On Tue, Dec 02, 2014 at 11:11:18AM +, David Vrabel wrote: On 01/12/14 22:36, Luis R. Rodriguez wrote: Then I do agree its a fair analogy (and find this obviously odd that how widespread cond_resched() is), we just don't have an equivalent for IRQ context, why not avoid the special check then and use this all the time in the middle of a hypercall on the return from an interrupt (e.g., the timer interrupt)? http://lists.xen.org/archives/html/xen-devel/2014-02/msg01101.html OK thanks! That explains why we need some asm code but in that submission you still also had used is_preemptible_hypercall(regs) and in the new implementation you use a CPU variable xen_in_preemptible_hcall prior to calling preempt_schedule_irq(). I believe you added the CPU variable because preempt_schedule_irq() will preempt first without any checks if it should, I'm asking why not do something like cond_resched_irq() where we check with should_resched() prior to preempting and that way we can avoid having to use the CPU variable? Because that could preempt at any asynchronous interrupt making the no-preempt kernel fully preemptive. OK yeah I see. That still doesn't negate the value of using something like cond_resched_irq() with a should_resched() on only critical hypercalls. The current implementation (patch by David) forces preemption without checking for should_resched() so it would preempt unnecessarily at least once. How would you know you are just doing a critical hypercall which should be preempted? You would not, you're right. I was just trying to see if we could generalize an API for this to avoid having users having to create their own CPU variables but this all seems very specialized as we want to use this on the timer so if we do generalize a cond_resched_irq() perhaps the documentation can warn about this type of case or abuse. David's patch had the check only it was x86 based, if we use cond_resched_irq() we can leave that aspect out to be done through asm inlines or it'll use the generic shoudl_resched(), that should save some code on the asm implementations. I have some patches now which generalizees this, I also have more information about this can happen exactly, and a way to triggger it on small systems with some hacks to emulate possibly backend behaviour on larger systems. In the worst case this can be a dangerious situation to be in. I'll send some new RFTs. Luis -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm ivy bridge support?
- Original Message - From: Rik van Riel r...@redhat.com To: Thomas Lau t...@tetrioncapital.com, Paolo Bonzini pbonz...@redhat.com Cc: Yong Wang yong.y.w...@linux.intel.com, kvm kvm@vger.kernel.org Sent: Friday, December 5, 2014 4:23:54 PM Subject: Re: kvm ivy bridge support? -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/05/2014 03:17 AM, Thomas Lau wrote: virsh cpu-models x86_64 | sort ... Penryn SandyBridge Westmere ... interesting that it doesn't have Ivy Bridge, any reason? The reason would be that you are running an older version. No, the reason could also be that for some reason we added Haswell and Broadwell, but forgot Ivy Bridge. I'll send QEMU patches next week. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/6] Improve PSCI system events and fix reboot bugs
On Wed, Dec 03, 2014 at 10:18:36PM +0100, Christoffer Dall wrote: Several people have reported problems with rebooting ARM VMs, especially on 32-bit ARM. This is mainly due to the same reason we were seeing boot errors in the past, namely that the ram, dcache, and icache weren't coherent on guest boot with the guest (stage-1) MMU disabled. We solved this by ensuring coherency when we fault in pages, but since most memory is already mapped after a reboot, we don't do anything. The solution is to unmap the regular RAM on VCPU init, but we must take care to not unmap the GIC or other IO regions, hence the somehwat complicated solution. As part of figuring this out, it became clear that some semantics around the KVM_ARM_VCPU_INIT ABI and system event ABI was unclear (what is userspace expected to do when it receives a system event). This series also clarifies the ABI and changes the kernel functionality to do what userspace expects (turn off VCPUs on a system shutdown event). The code is avaliable here as well: http://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git vcpu_init_fixes-v2 There is an alternative version with more code-reuse for the unmapping implementation for the previous version of this patch series available in the following git repo: http://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git vcpu_init_fixes-alternative Testing --- This has been tested on CubieBoard, Arndale, TC2, and Juno. On Arndale and TC2 it was extremely easy to reproduce the problem (just start a VM that runs reboot from /etc/rc.local or similar) and this series clearly fixes the behavior. For the previous version of this series, I was seeing some problems on Juno, but it turned out to be because I wasn't limiting my testing to one of the clusters, and since we don't support re-initing a VCPU on a different physical host CPU (big.LITTLE), it was failing. For this version of the patch series, it has been running a reboot loop on Juno for hours. Just tested this version. Looks good. No problems after install nor after many, many reboots. drew Changelog - Changes v1-v2: - New patch to not clear the VCPU_POWER_OFF flag - Fixed spelling error in commit message - Adapted ABI texts based on Peter's feedback - Check for changed parameters to KVM_ARM_VCPU_INIT - Now unmap the Stage-2 RAM mappings at VCPU init instead of at PSCI system event time. Christoffer Dall (6): arm/arm64: KVM: Don't clear the VCPU_POWER_OFF flag arm/arm64: KVM: Correct KVM_ARM_VCPU_INIT power off option arm/arm64: KVM: Reset the HCR on each vcpu when resetting the vcpu arm/arm64: KVM: Clarify KVM_ARM_VCPU_INIT ABI arm/arm64: KVM: Turn off vcpus on PSCI shutdown/reboot arm/arm64: KVM: Introduce stage2_unmap_vm Documentation/virtual/kvm/api.txt| 17 +- arch/arm/include/asm/kvm_emulate.h | 5 +++ arch/arm/include/asm/kvm_host.h | 2 -- arch/arm/include/asm/kvm_mmu.h | 1 + arch/arm/kvm/arm.c | 56 ++- arch/arm/kvm/guest.c | 26 --- arch/arm/kvm/mmu.c | 65 arch/arm/kvm/psci.c | 19 +++ arch/arm64/include/asm/kvm_emulate.h | 5 +++ arch/arm64/include/asm/kvm_host.h| 3 +- arch/arm64/include/asm/kvm_mmu.h | 1 + arch/arm64/kvm/guest.c | 26 --- 12 files changed, 168 insertions(+), 58 deletions(-) -- 2.1.2.330.g565301e.dirty -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Stupid Xen vs KVM question
On Dec 5, 2014 8:09 AM, Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote: On Fri, Dec 05, 2014 at 08:29:54AM +0100, Paolo Bonzini wrote: On 05/12/2014 03:24, Konrad Rzeszutek Wilk wrote: We could do a simple thing - which is that the paravirt_enabled could have the value 1 for Xen and 2 for KVM. The assembler logic would be inverted and just check for 1. I am not going to attempt to write the assembler code :-) Wouldn't Xen HVM also want to be 2? Oddly enough it was never set! Looking at where the paravit_enabled() macro is used, on KVM it could be just set to zero. I noticed that KVM is setting paravirt_enabled = 1 twice, once the main kvm guest code and once in kvmclock. Will the EBDA code in head.c care? That would also increase sanity a little IMO. We currently have paravirt meaning that the normal HW architecture isn't present (Den, lguest) and paravirt meaning that there are extra optional-to-use hypervisor features (KVM, etc). --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] uio/uio_pci_generic: don't return zero on failure path in probe()
If uio_register_device() fails in probe(), it breaks off initialization, deallocates all resources, but returns zero. The patch adds proper error code propagation. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov khoroshi...@ispras.ru --- drivers/uio/uio_pci_generic.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/uio/uio_pci_generic.c b/drivers/uio/uio_pci_generic.c index 077ae12269ce..d0b508b68f3c 100644 --- a/drivers/uio/uio_pci_generic.c +++ b/drivers/uio/uio_pci_generic.c @@ -91,7 +91,8 @@ static int probe(struct pci_dev *pdev, gdev-info.handler = irqhandler; gdev-pdev = pdev; - if (uio_register_device(pdev-dev, gdev-info)) + err = uio_register_device(pdev-dev, gdev-info); + if (err) goto err_register; pci_set_drvdata(pdev, gdev); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC] KVM: x86: nested: support for MSR loading/storing
BitVisor hypervisor fails to start a nested guest due to lack of MSR load/store support in KVM. This patch fixes this problem according to Intel SDM. Signed-off-by: Eugene Korenevsky ekorenev...@gmail.com --- arch/x86/include/asm/vmx.h| 6 ++ arch/x86/include/uapi/asm/msr-index.h | 3 + arch/x86/include/uapi/asm/vmx.h | 1 + arch/x86/kvm/vmx.c| 191 +- virt/kvm/kvm_main.c | 1 + 5 files changed, 197 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index bcbfade..e07414c 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -454,6 +454,12 @@ struct vmx_msr_entry { #define ENTRY_FAIL_VMCS_LINK_PTR 4 /* + * VMX Abort codes + */ +#define VMX_ABORT_MSR_STORE1 +#define VMX_ABORT_MSR_LOAD 4 + +/* * VM-instruction error numbers */ enum vm_instruction_error_number { diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h index e21331c..3c9c601 100644 --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -316,6 +316,9 @@ #define MSR_IA32_UCODE_WRITE 0x0079 #define MSR_IA32_UCODE_REV 0x008b +#define MSR_IA32_SMM_MONITOR_CTL 0x009b +#define MSR_IA32_SMBASE0x009e + #define MSR_IA32_PERF_STATUS 0x0198 #define MSR_IA32_PERF_CTL 0x0199 #define MSR_AMD_PSTATE_DEF_BASE0xc0010064 diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h index 990a2fe..f5f8a62 100644 --- a/arch/x86/include/uapi/asm/vmx.h +++ b/arch/x86/include/uapi/asm/vmx.h @@ -56,6 +56,7 @@ #define EXIT_REASON_MSR_READ31 #define EXIT_REASON_MSR_WRITE 32 #define EXIT_REASON_INVALID_STATE 33 +#define EXIT_REASON_MSR_LOAD_FAILURE34 #define EXIT_REASON_MWAIT_INSTRUCTION 36 #define EXIT_REASON_MONITOR_INSTRUCTION 39 #define EXIT_REASON_PAUSE_INSTRUCTION 40 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6a951d8..deebc96 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8498,6 +8498,163 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) kvm_register_write(vcpu, VCPU_REGS_RIP, vmcs12-guest_rip); } +struct msr_entry { + u32 index; + u32 reserved; + u64 data; +} __packed; + +static bool vmx_msr_switch_area_verify(u32 count, u64 addr, int maxphyaddr) +{ + if (count == 0) + return true; + if ((addr 0xf) != 0) + return false; + if ((addr maxphyaddr) != 0) + return false; + if (((addr + count * sizeof(struct msr_entry) - 1) maxphyaddr) != 0) + return false; + return true; +} + +static bool nested_vmx_msr_switch_verify(struct kvm_vcpu *vcpu, +struct vmcs12 *vmcs12) +{ + int maxphyaddr = cpuid_maxphyaddr(vcpu); + +#define VMCS12_MSR_SWITCH_VERIFY(name) { \ + if (!vmx_msr_switch_area_verify(vmcs12-name##_count, \ + vmcs12-name##_addr, maxphyaddr)) { \ + pr_warn_ratelimited( \ + %s: invalid MSR_{LOAD,STORE} parameters, \ + #name); \ + return false; \ + } \ + } + VMCS12_MSR_SWITCH_VERIFY(vm_exit_msr_load); + VMCS12_MSR_SWITCH_VERIFY(vm_exit_msr_store); + VMCS12_MSR_SWITCH_VERIFY(vm_entry_msr_load); + return true; +} + +static bool vmx_load_msr_entry_verify(struct kvm_vcpu *vcpu, + struct msr_entry *e) +{ + if (e-index == MSR_FS_BASE || e-index == MSR_GS_BASE) + return false; + /* SMM is not supported */ + if (e-index == MSR_IA32_SMM_MONITOR_CTL) + return false; + /* x2APIC MSR accesses are not allowed */ + if (apic_x2apic_mode(vcpu-arch.apic) (e-index 24) == 0x800) + return false; + if (e-reserved != 0) + return false; + return true; +} + +static bool vmx_msr_switch_is_protected_msr(u32 msr_index) +{ + /* Intel SDM: a processor MAY prevent writing to these MSRs when +* loading guest states on VM entries or saving guest states on VM +* exits. +* Assume our emulated processor DOES prevent writing */ + return msr_index == MSR_IA32_UCODE_WRITE || + msr_index == MSR_IA32_UCODE_REV; +} + +static unsigned int nested_vmx_load_msrs(struct kvm_vcpu *vcpu, +u32 count, u64 addr) +{ + unsigned int i; + struct msr_entry msr_entry; + struct msr_data msr; + + for (i = 1; i = count; i++, addr += sizeof(msr_entry)) { + if (kvm_read_guest(vcpu-kvm, addr, + msr_entry,
[PATCH] x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit
paravirt_enabled has the following effects: - Disables the F00F bug workaround warning. There is no F00F bug workaround any more because Linux's standard IDT handling already works around the F00F bug, but the warning still exists. This is only cosmetic, and, in any event, there is no such thing as KVM on a CPU with the F00F bug. - Disables 32-bit APM BIOS detection. On a KVM paravirt system, there should be no APM BIOS anyway. - Disables tboot. I think that the tboot code should check the CPUID hypervisor bit directly if it matters. - paravirt_enabled disables espfix32. espfix32 should *not* be disabled under KVM paravirt. The last point is the purpose of this patch. It fixes a leak of the high 16 bits of the kernel stack address on 32-bit KVM paravirt guests. While I'm at it, this removes pv_info setup from kvmclock. That code seems to serve no purpose. Cc: sta...@vger.kernel.org Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/kernel/kvm.c | 9 - arch/x86/kernel/kvmclock.c | 2 -- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index f6945bef2cd1..94f643484300 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -283,7 +283,14 @@ NOKPROBE_SYMBOL(do_async_page_fault); static void __init paravirt_ops_setup(void) { pv_info.name = KVM; - pv_info.paravirt_enabled = 1; + + /* +* KVM isn't paravirt in the sense of paravirt_enabled. A KVM +* guest kernel works like a bare metal kernel with additional +* features, and paravirt_enabled is about features that are +* missing. +*/ + pv_info.paravirt_enabled = 0; if (kvm_para_has_feature(KVM_FEATURE_NOP_IO_DELAY)) pv_cpu_ops.io_delay = kvm_io_delay; diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index d9156ceecdff..d4d9a8ad7893 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -263,8 +263,6 @@ void __init kvmclock_init(void) #endif kvm_get_preset_lpj(); clocksource_register_hz(kvm_clock, NSEC_PER_SEC); - pv_info.paravirt_enabled = 1; - pv_info.name = KVM; if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT)) pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT); -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 4/6] hw_random: fix unregister race.
On Wed, Nov 12, 2014 at 02:33:00PM +1030, Rusty Russell wrote: Amos Kong ak...@redhat.com writes: From: Rusty Russell ru...@rustcorp.com.au The previous patch added one potential problem: we can still be reading from a hwrng when it's unregistered. Add a wait for zero in the hwrng_unregister path. v4: add cleanup_done flag to insure that cleanup is done That's a bit weird. The usual pattern would be to hold a reference until we're actually finished, but this reference is a bit weird. We hold the mutex across cleanup, so we could grab that but we have to take care sleeping inside wait_event, otherwise Peter will have to fix my code again :) AFAICT the wake_woken() stuff isn't merged yet, so your patch will have to do for now. @@ -98,6 +99,8 @@ static inline void cleanup_rng(struct kref *kref) if (rng-cleanup) rng-cleanup(rng); + rng-cleanup_done = true; + wake_up_all(rng_done); } static void set_current_rng(struct hwrng *rng) @@ -536,6 +539,11 @@ void hwrng_unregister(struct hwrng *rng) kthread_stop(hwrng_fill); } else mutex_unlock(rng_mutex); + + /* Just in case rng is reading right now, wait. */ + wait_event(rng_done, rng-cleanup_done + atomic_read(rng-ref.refcount) == 0); + The atomic_read() isn't necessary here. At least, we need it to convert refcount from atomic_t to int. Otherwise, we will touch this error: error: invalid operands to binary == (have 'atomic_t' and 'int') However, you should probably init cleanup_done in hwrng_register(). (Probably noone does unregister then register, but let's be clear). Thanks, Rusty. -- Amos. signature.asc Description: Digital signature
[PATCH v5 4/6] hw_random: fix unregister race.
From: Rusty Russell ru...@rustcorp.com.au The previous patch added one potential problem: we can still be reading from a hwrng when it's unregistered. Add a wait for zero in the hwrng_unregister path. v5: reset cleanup_done flag, use compiler barrier to prevent recording. v4: add cleanup_done flag to insure that cleanup is done Signed-off-by: Rusty Russell ru...@rustcorp.com.au Signed-off-by: Amos Kong ak...@redhat.com --- drivers/char/hw_random/core.c | 12 include/linux/hw_random.h | 1 + 2 files changed, 13 insertions(+) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 83516cb..067270b 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -60,6 +60,7 @@ static DEFINE_MUTEX(rng_mutex); static DEFINE_MUTEX(reading_mutex); static int data_avail; static u8 *rng_buffer, *rng_fillbuf; +static DECLARE_WAIT_QUEUE_HEAD(rng_done); static unsigned short current_quality; static unsigned short default_quality; /* = 0; default to off */ @@ -98,6 +99,11 @@ static inline void cleanup_rng(struct kref *kref) if (rng-cleanup) rng-cleanup(rng); + + /* cleanup_done should be updated after cleanup finishes */ + smp_wmb(); + rng-cleanup_done = true; + wake_up_all(rng_done); } static void set_current_rng(struct hwrng *rng) @@ -498,6 +504,8 @@ int hwrng_register(struct hwrng *rng) add_early_randomness(rng); } + rng-cleanup_done = false; + out_unlock: mutex_unlock(rng_mutex); out: @@ -529,6 +537,10 @@ void hwrng_unregister(struct hwrng *rng) kthread_stop(hwrng_fill); } else mutex_unlock(rng_mutex); + + /* Just in case rng is reading right now, wait. */ + wait_event(rng_done, rng-cleanup_done + atomic_read(rng-ref.refcount) == 0); } EXPORT_SYMBOL_GPL(hwrng_unregister); diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h index c212e71..7832e50 100644 --- a/include/linux/hw_random.h +++ b/include/linux/hw_random.h @@ -46,6 +46,7 @@ struct hwrng { /* internal. */ struct list_head list; struct kref ref; + bool cleanup_done; }; /** Register a new Hardware Random Number Generator driver. */ -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 6/6] hw_random: don't init list element we're about to add to list.
From: Rusty Russell ru...@rustcorp.com.au Another interesting anti-pattern. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/char/hw_random/core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index a9286bf..4d13ac5 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -489,7 +489,6 @@ int hwrng_register(struct hwrng *rng) goto out_unlock; } } - INIT_LIST_HEAD(rng-list); list_add_tail(rng-list, rng_list); if (old_rng !rng-init) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 5/6] hw_random: don't double-check old_rng.
From: Rusty Russell ru...@rustcorp.com.au Interesting anti-pattern. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/char/hw_random/core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 067270b..a9286bf 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -476,14 +476,13 @@ int hwrng_register(struct hwrng *rng) } old_rng = current_rng; + err = 0; if (!old_rng) { err = hwrng_init(rng); if (err) goto out_unlock; set_current_rng(rng); - } - err = 0; - if (!old_rng) { + err = register_miscdev(); if (err) { drop_current_rng(); -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 3/6] hw_random: use reference counts on each struct hwrng.
From: Rusty Russell ru...@rustcorp.com.au current_rng holds one reference, and we bump it every time we want to do a read from it. This means we only hold the rng_mutex to grab or drop a reference, so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't block on read of /dev/hwrng. Using a kref is overkill (we're always under the rng_mutex), but a standard pattern. This also solves the problem that the hwrng_fillfn thread was accessing current_rng without a lock, which could change (eg. to NULL) underneath it. v5: drop redundant kref_init() v4: decrease last reference for triggering the cleanup v3: initialize kref (thanks Amos Kong) v2: fix missing put_rng() on exit path (thanks Amos Kong) Signed-off-by: Rusty Russell ru...@rustcorp.com.au Signed-off-by: Amos Kong ak...@redhat.com --- drivers/char/hw_random/core.c | 135 -- include/linux/hw_random.h | 2 + 2 files changed, 94 insertions(+), 43 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index a0905c8..83516cb 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -42,6 +42,7 @@ #include linux/delay.h #include linux/slab.h #include linux/random.h +#include linux/err.h #include asm/uaccess.h @@ -91,6 +92,60 @@ static void add_early_randomness(struct hwrng *rng) add_device_randomness(bytes, bytes_read); } +static inline void cleanup_rng(struct kref *kref) +{ + struct hwrng *rng = container_of(kref, struct hwrng, ref); + + if (rng-cleanup) + rng-cleanup(rng); +} + +static void set_current_rng(struct hwrng *rng) +{ + BUG_ON(!mutex_is_locked(rng_mutex)); + kref_get(rng-ref); + current_rng = rng; +} + +static void drop_current_rng(void) +{ + BUG_ON(!mutex_is_locked(rng_mutex)); + if (!current_rng) + return; + + /* decrease last reference for triggering the cleanup */ + kref_put(current_rng-ref, cleanup_rng); + current_rng = NULL; +} + +/* Returns ERR_PTR(), NULL or refcounted hwrng */ +static struct hwrng *get_current_rng(void) +{ + struct hwrng *rng; + + if (mutex_lock_interruptible(rng_mutex)) + return ERR_PTR(-ERESTARTSYS); + + rng = current_rng; + if (rng) + kref_get(rng-ref); + + mutex_unlock(rng_mutex); + return rng; +} + +static void put_rng(struct hwrng *rng) +{ + /* +* Hold rng_mutex here so we serialize in case they set_current_rng +* on rng again immediately. +*/ + mutex_lock(rng_mutex); + if (rng) + kref_put(rng-ref, cleanup_rng); + mutex_unlock(rng_mutex); +} + static inline int hwrng_init(struct hwrng *rng) { if (rng-init) { @@ -113,12 +168,6 @@ static inline int hwrng_init(struct hwrng *rng) return 0; } -static inline void hwrng_cleanup(struct hwrng *rng) -{ - if (rng rng-cleanup) - rng-cleanup(rng); -} - static int rng_dev_open(struct inode *inode, struct file *filp) { /* enforce read-only access to this chrdev */ @@ -154,21 +203,22 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, ssize_t ret = 0; int err = 0; int bytes_read, len; + struct hwrng *rng; while (size) { - if (mutex_lock_interruptible(rng_mutex)) { - err = -ERESTARTSYS; + rng = get_current_rng(); + if (IS_ERR(rng)) { + err = PTR_ERR(rng); goto out; } - - if (!current_rng) { + if (!rng) { err = -ENODEV; - goto out_unlock; + goto out; } mutex_lock(reading_mutex); if (!data_avail) { - bytes_read = rng_get_data(current_rng, rng_buffer, + bytes_read = rng_get_data(rng, rng_buffer, rng_buffer_size(), !(filp-f_flags O_NONBLOCK)); if (bytes_read 0) { @@ -200,8 +250,8 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, ret += len; } - mutex_unlock(rng_mutex); mutex_unlock(reading_mutex); + put_rng(rng); if (need_resched()) schedule_timeout_interruptible(1); @@ -213,12 +263,11 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, } out: return ret ? : err; -out_unlock: - mutex_unlock(rng_mutex); - goto out; + out_unlock_reading: mutex_unlock(reading_mutex); - goto out_unlock; + put_rng(rng); + goto out; } @@ -257,8 +306,8 @@ static ssize_t hwrng_attr_current_store(struct device *dev,
[PATCH v5 0/6] fix hw_random stuck
When I hotunplug a busy virtio-rng device or try to access hwrng attributes in non-smp guest, it gets stuck. My hotplug tests: | test 0: | hotunplug rng device from qemu monitor | | test 1: | guest) # dd if=/dev/hwrng of=/dev/null | hotunplug rng device from qemu monitor | | test 2: | guest) # dd if=/dev/random of=/dev/null | hotunplug rng device from qemu monitor | | test 4: | guest) # dd if=/dev/hwrng of=/dev/null | cat /sys/devices/virtual/misc/hw_random/rng_* | | test 5: | guest) # dd if=/dev/hwrng of=/dev/null | cancel dd process after 10 seconds | guest) # dd if=/dev/hwrng of=/dev/null | hotunplug rng device from qemu monitor | | test 6: | use a fifo as rng backend, execute test 0 ~ 5 with no input of fifo V5: reset cleanup_done flag, drop redundant init of reference count, use compiler barrier to prevent recording. V4: update patch 4 to fix corrupt, decrease last reference for triggering the cleanup, fix unregister race pointed by Herbert V3: initialize kref to 1 V2: added patch 2 to fix a deadlock, update current patch 3 to fix reference counting issue Amos Kong (1): hw_random: move some code out mutex_lock for avoiding underlying deadlock Rusty Russell (5): hw_random: place mutex around read functions and buffers. hw_random: use reference counts on each struct hwrng. hw_random: fix unregister race. hw_random: don't double-check old_rng. hw_random: don't init list element we're about to add to list. drivers/char/hw_random/core.c | 173 ++ include/linux/hw_random.h | 3 + 2 files changed, 126 insertions(+), 50 deletions(-) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 2/6] hw_random: move some code out mutex_lock for avoiding underlying deadlock
In next patch, we use reference counting for each struct hwrng, changing reference count also needs to take mutex_lock. Before releasing the lock, if we try to stop a kthread that waits to take the lock to reduce the referencing count, deadlock will occur. Signed-off-by: Amos Kong ak...@redhat.com --- drivers/char/hw_random/core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index b1b6042..a0905c8 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -474,12 +474,12 @@ void hwrng_unregister(struct hwrng *rng) } } if (list_empty(rng_list)) { + mutex_unlock(rng_mutex); unregister_miscdev(); if (hwrng_fill) kthread_stop(hwrng_fill); - } - - mutex_unlock(rng_mutex); + } else + mutex_unlock(rng_mutex); } EXPORT_SYMBOL_GPL(hwrng_unregister); -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 1/6] hw_random: place mutex around read functions and buffers.
From: Rusty Russell ru...@rustcorp.com.au There's currently a big lock around everything, and it means that we can't query sysfs (eg /sys/devices/virtual/misc/hw_random/rng_current) while the rng is reading. This is a real problem when the rng is slow, or blocked (eg. virtio_rng with qemu's default /dev/random backend) This doesn't help (it leaves the current lock untouched), just adds a lock to protect the read function and the static buffers, in preparation for transition. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/char/hw_random/core.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index aa30a25..b1b6042 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -53,7 +53,10 @@ static struct hwrng *current_rng; static struct task_struct *hwrng_fill; static LIST_HEAD(rng_list); +/* Protects rng_list and current_rng */ static DEFINE_MUTEX(rng_mutex); +/* Protects rng read functions, data_avail, rng_buffer and rng_fillbuf */ +static DEFINE_MUTEX(reading_mutex); static int data_avail; static u8 *rng_buffer, *rng_fillbuf; static unsigned short current_quality; @@ -81,7 +84,9 @@ static void add_early_randomness(struct hwrng *rng) unsigned char bytes[16]; int bytes_read; + mutex_lock(reading_mutex); bytes_read = rng_get_data(rng, bytes, sizeof(bytes), 1); + mutex_unlock(reading_mutex); if (bytes_read 0) add_device_randomness(bytes, bytes_read); } @@ -128,6 +133,7 @@ static inline int rng_get_data(struct hwrng *rng, u8 *buffer, size_t size, int wait) { int present; + BUG_ON(!mutex_is_locked(reading_mutex)); if (rng-read) return rng-read(rng, (void *)buffer, size, wait); @@ -160,13 +166,14 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, goto out_unlock; } + mutex_lock(reading_mutex); if (!data_avail) { bytes_read = rng_get_data(current_rng, rng_buffer, rng_buffer_size(), !(filp-f_flags O_NONBLOCK)); if (bytes_read 0) { err = bytes_read; - goto out_unlock; + goto out_unlock_reading; } data_avail = bytes_read; } @@ -174,7 +181,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, if (!data_avail) { if (filp-f_flags O_NONBLOCK) { err = -EAGAIN; - goto out_unlock; + goto out_unlock_reading; } } else { len = data_avail; @@ -186,7 +193,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, if (copy_to_user(buf + ret, rng_buffer + data_avail, len)) { err = -EFAULT; - goto out_unlock; + goto out_unlock_reading; } size -= len; @@ -194,6 +201,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, } mutex_unlock(rng_mutex); + mutex_unlock(reading_mutex); if (need_resched()) schedule_timeout_interruptible(1); @@ -208,6 +216,9 @@ out: out_unlock: mutex_unlock(rng_mutex); goto out; +out_unlock_reading: + mutex_unlock(reading_mutex); + goto out_unlock; } @@ -348,13 +359,16 @@ static int hwrng_fillfn(void *unused) while (!kthread_should_stop()) { if (!current_rng) break; + mutex_lock(reading_mutex); rc = rng_get_data(current_rng, rng_fillbuf, rng_buffer_size(), 1); + mutex_unlock(reading_mutex); if (rc = 0) { pr_warn(hwrng: no data available\n); msleep_interruptible(1); continue; } + /* Outside lock, sure, but y'know: randomness. */ add_hwgenerator_randomness((void *)rng_fillbuf, rc, rc * current_quality * 8 10); } -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE][RFC] KVMGT - the implementation of Intel GVT-g(full GPU virtualization) for KVM
On 12/05/2014 04:50 PM, Gerd Hoffmann wrote: A few comments on the kernel stuff (brief look so far, also compile-tested only, intel gfx on my test machine is too old). * Noticed the kernel bits don't even compile when configured as module. Everything (vgt, i915, kvm) must be compiled into the kernel. Yes, that's planned to be done along with separating hypervisor-related code from vgt. * Design approach still seems to be i915 on vgt not the other way around. So far yes. Qemu/SeaBIOS bits: I've seen the host bridge changes identity from i440fx to copy-pci-ids-from-host. Guess the reason for this is that seabios uses this device to figure whenever it is running on i440fx or q35. Correct? I did some trick in seabios/qemu. The purpose is to make qemu: - provide IDs of an old host bridge to SeaBIOS - provide IDs of new host bridge(the physical ones) to guest OS So I made seabios to tell qemu that POST is done before jumping to guest OS context. This may be the simplest method to make things work, but yes, q35 emulation of qemu may have this unnecessary, see below. What are the exact requirements for the device? Must it match the host exactly, to not confuse the guest intel graphics driver? Or would something more recent -- such as the q35 emulation qemu has -- be good enough to make things work (assuming we add support for the graphic-related pci config space registers there)? I don't know that is exactly needed, we also need to have Windows driver considered. However, I'm quite confident that, if things gonna work for IGD passthrough, it gonna work for GVT-g. The patch also adds a dummy isa bridge at 0x1f. Simliar question here: What exactly is needed here? Would things work if we simply use the q35 lpc device here? Ditto. more to come after I've read the paper linked above ... Thanks for review :) cheers, Gerd -- Thanks, Jike -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE][RFC] KVMGT - the implementation of Intel GVT-g(full GPU virtualization) for KVM
CC Andy :) On 12/05/2014 09:03 PM, Paolo Bonzini wrote: On 05/12/2014 09:50, Gerd Hoffmann wrote: A few comments on the kernel stuff (brief look so far, also compile-tested only, intel gfx on my test machine is too old). * Noticed the kernel bits don't even compile when configured as module. Everything (vgt, i915, kvm) must be compiled into the kernel. I'll add that the patch is basically impossible to review with all the XenGT bits still in. For example, the x86 emulator seems to be unnecessary for KVMGT, but I am not 100% sure. This is not ready for merge yet, please wait for a while, we'll have Xen/KVM specific code separated. BTW, definitely you are right, the emulator is unnecessary for KVMGT, and ... unnecessary for XenGT :) I would like a clear understanding of why/how Andrew Barnes was able to do i915 passthrough (GVT-d) without hacking the ISA bridge, and why this does not apply to GVT-g. AFAIK, the graphics drivers need to figure out the offset of some MMIO registers, by the IDs of this ISA bridge. It simply won't work without this information. Talked with Andy about the pass-through but I don't have his implementation, CC Andy for his advice :) Paolo Thanks for review. Would you please also have a look at the issues I mentioned in the original email? they are most KVM-related: the SRCU trickiness, domid, and the memslot created in kernel. Thank you! -- Thanks, Jike -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Intel-gfx] [ANNOUNCE][RFC] KVMGT - the implementation of Intel GVT-g(full GPU virtualization) for KVM
On 12/05/2014 09:54 PM, Daniel Vetter wrote: Yeah done a quick read-through of just the i915 bits too, same comment. I guess this is just the first RFC and the redesign we've discussed about already with xengt is in progress somewhere? Yes, it's marching on with Xen now. The KVM implementation is currently not even feature complete - we still have PPGTT missing. Thanks, Daniel -- Thanks, Jike -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/6] fix hw_random stuck
On Sat, Dec 06, 2014 at 12:16:36PM +0800, Amos Kong wrote: When I hotunplug a busy virtio-rng device or try to access hwrng attributes in non-smp guest, it gets stuck. Please resend these via the linux-crypto mailing list so they can be picked up by patchwork. Thanks, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html