Re: [CFT PATCH v2 0/2] KVM: support XSAVES usage in the host
Wanpeng Li wanpeng...@linux.intel.com wrote: Hi all, On Tue, Nov 25, 2014 at 04:50:06PM +0200, Nadav Amit wrote: On Nov 25, 2014, at 16:17, Paolo Bonzini pbonz...@redhat.com wrote: On 25/11/2014 15:05, Nadav Amit wrote: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 373b0ab9a32e..ca26681455c2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6955,6 +6955,9 @@ int fx_init(struct kvm_vcpu *vcpu) return err; fpu_finit(vcpu-arch.guest_fpu); + if (cpu_has_xsaves) + vcpu-arch.guest_fpu.state-xsave.xsave_hdr.xcomp_bv = + host_xcr0 | XSTATE_COMPACTION_ENABLED; /* * Ensure guest xcr0 is valid for loading The second version works for me (w/qemu v2.1.0; Linux 3.13 guest). I Could you try 3.17 guest which has xsaves enabled? Because I'm not sure if the below codes from Paolo is enough to mask XSAVES, should we also add F(XSAVES)? I guess this paragraph is intended for me. As I said before, I got limited access to the machine - I will not be able to run these experiments before Sunday, and I am not sure that I will have time to fully debug it. If you insist, the very least please run some experiments running a 3.16 (or older) guest kernel and running 3.17 with ‘noxsaves’ kernel parameter. + const u32 kvm_supported_word10_x86_features = + F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1); + In addition, the 3.17 guest is still hang as I mentioned even if I add the F(XSAVES) to the kvm_supported_word10_x86_features. did not need to apply this patch on top. [although I am not sure whether relying on userspace to call KVM_SET_XSAVE early enough is a good practice]. Did you actually try the patch? :) If it works, I'm tempted to apply it anyway. Yes, I tried it both with and without this patch. Due to time constraints I only tested minimal functionality (Linux boot). I will run more tests in the near future. Anyhow, you can put the: Tested-by: Nadav Amit na...@cs.technion.ac.il One disclaimer: Since I got limited time with the machine, I executed a slightly modified kernel/qemu, and not the latest version. Anyhow, I don’t think these differences can have any impact. Yes, that is no problem. I am just worried that Wanpeng reported it fails, while I report it works... I have another patch which enable xsaves in KVM and the patch is still under debug with Paolo's patch KVM: x86: support XSAVES usage in the host, so the 1/2 patch from Paolo can be dropped if my patch is ready. Anyway, a quick fix is needed before enable xsaves in kvm. xsaves is already enabled in KVM (the host) since 3.17. It just didn’t work… Nadav -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CFT PATCH v2 0/2] KVM: support XSAVES usage in the host
Hi Nadav, On Wed, Nov 26, 2014 at 11:00:34AM +0200, Nadav Amit wrote: Wanpeng Li wanpeng...@linux.intel.com wrote: Hi all, On Tue, Nov 25, 2014 at 04:50:06PM +0200, Nadav Amit wrote: On Nov 25, 2014, at 16:17, Paolo Bonzini pbonz...@redhat.com wrote: On 25/11/2014 15:05, Nadav Amit wrote: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 373b0ab9a32e..ca26681455c2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6955,6 +6955,9 @@ int fx_init(struct kvm_vcpu *vcpu) return err; fpu_finit(vcpu-arch.guest_fpu); +if (cpu_has_xsaves) +vcpu-arch.guest_fpu.state-xsave.xsave_hdr.xcomp_bv = +host_xcr0 | XSTATE_COMPACTION_ENABLED; /* * Ensure guest xcr0 is valid for loading The second version works for me (w/qemu v2.1.0; Linux 3.13 guest). I Could you try 3.17 guest which has xsaves enabled? Because I'm not sure if the below codes from Paolo is enough to mask XSAVES, should we also add F(XSAVES)? I guess this paragraph is intended for me. As I said before, I got limited access to the machine - I will not be able to run these experiments before Sunday, and I am not sure that I will have time to fully debug it. If you insist, the very least please run some experiments running a 3.16 (or older) guest kernel and running 3.17 with ‘noxsaves’ kernel parameter. + const u32 kvm_supported_word10_x86_features = + F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1); + In addition, the 3.17 guest is still hang as I mentioned even if I add the F(XSAVES) to the kvm_supported_word10_x86_features. did not need to apply this patch on top. [although I am not sure whether relying on userspace to call KVM_SET_XSAVE early enough is a good practice]. Did you actually try the patch? :) If it works, I'm tempted to apply it anyway. Yes, I tried it both with and without this patch. Due to time constraints I only tested minimal functionality (Linux boot). I will run more tests in the near future. Anyhow, you can put the: Tested-by: Nadav Amit na...@cs.technion.ac.il One disclaimer: Since I got limited time with the machine, I executed a slightly modified kernel/qemu, and not the latest version. Anyhow, I don’t think these differences can have any impact. Yes, that is no problem. I am just worried that Wanpeng reported it fails, while I report it works... I have another patch which enable xsaves in KVM and the patch is still under debug with Paolo's patch KVM: x86: support XSAVES usage in the host, so the 1/2 patch from Paolo can be dropped if my patch is ready. Anyway, a quick fix is needed before enable xsaves in kvm. xsaves is already enabled in KVM (the host) since 3.17. It just didn’t work… Not yet, it is enabled in native currrently instead of vmx and my patch is still under debug. Regards, Wanpeng Li Nadav -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-i386: add feature flags for CPUID[EAX=0xd,ECX=1]
On 25/11/2014 21:02, Paolo Bonzini wrote: +static const char *cpuid_xsave_feature_name[] = { +xsaveopt, xsavec, xgetbv1, xsaves, None of the above features introduce any new state that might need to be migrated, or will require other changes in QEMU to work, right? It looks like they don't introduce any extra state, but if they do, they need to be added to unmigratable_flags until migration support is implemented. If they require other QEMU changes, it would be nice if KVM reported them using KVM_CHECK_EXTENSION instead of GET_SUPPORTED_CPUID, so it wouldn't break -cpu host. No, they don't. Actually, xsaves does but I don't think KVM_CHECK_EXTENSION is right. It's just another MSR, and we haven't used KVM_CHECK_EXTENSION for new MSRs and new XSAVE areas (last example: avx512). Since no hardware really exists for it, and KVM does not support it anyway, I think it's simplest to leave xsaves out for now. Is this right? Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 1/2] KVM: don't check for PF_VCPU when yielding
This change is a trade-off. PRO: This patch would improve the case of preemption on s390. This is probably a corner case as most distros have preemption off anyway. CON: The downside is that kvm_vcpu_yield_to is called also from kvm_vcpu_on_spin. Here we want to avoid the scheduler overhead for a wrong decision. Won't most of that part be covered by: if (!ACCESS_ONCE(vcpu-preempted)) vcpu-preempted is only set when scheduled out involuntarily. It is cleared when scheduled in. s390 sets it manually, to speed up waking up a vcpu. So when our task is scheduled in (an PF_VCPU is set), this check will already avoid scheduler overhead in kvm_vcpu_on_spin() or am I missing something? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 1/2] KVM: don't check for PF_VCPU when yielding
Am 26.11.2014 um 10:23 schrieb David Hildenbrand: This change is a trade-off. PRO: This patch would improve the case of preemption on s390. This is probably a corner case as most distros have preemption off anyway. CON: The downside is that kvm_vcpu_yield_to is called also from kvm_vcpu_on_spin. Here we want to avoid the scheduler overhead for a wrong decision. Won't most of that part be covered by: if (!ACCESS_ONCE(vcpu-preempted)) Hmm, right. Checking vcpu-preempted and PF_VCPU might boil down to the same. Would be good if to have to performance regression test, though. vcpu-preempted is only set when scheduled out involuntarily. It is cleared when scheduled in. s390 sets it manually, to speed up waking up a vcpu. So when our task is scheduled in (an PF_VCPU is set), this check will already avoid scheduler overhead in kvm_vcpu_on_spin() or am I missing something? CC Raghavendra KT. Could be rerun your kernbench/sysbench/ebizzy setup on x86 to see if the patch in this thread causes any regression? If think your commit 7bc7ae25b143kvm: Iterate over only vcpus that are preempted might have really made the PF_VCPU check unnecessary CC Michael Mueller, do we still have our yield performance setup handy to check if this patch causes any regression? Christian -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/3] KVM: arm: add irqfd support
On Tue, Nov 25, 2014 at 02:12:31PM +0100, Eric Auger wrote: On 11/25/2014 11:19 AM, Christoffer Dall wrote: [Re-adding cc list] On Mon, Nov 24, 2014 at 05:42:30PM +0100, Eric Auger wrote: On 11/24/2014 04:47 PM, Christoffer Dall wrote: On Mon, Nov 24, 2014 at 12:02 PM, Eric Auger eric.au...@linaro.org wrote: On 11/24/2014 11:00 AM, Christoffer Dall wrote: On Sun, Nov 23, 2014 at 06:56:59PM +0100, Eric Auger wrote: This patch enables irqfd on arm. Both irqfd and resamplefd are supported. Injection is implemented in vgic.c without routing. This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD. KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set. Signed-off-by: Eric Auger eric.au...@linaro.org --- [...] +int kvm_set_irq(struct kvm *kvm, int irq_source_id, +u32 irq, int level, bool line_status) +{ +unsigned int spi = irq + VGIC_NR_PRIVATE_IRQS; + +kvm_debug(irqfd sets vIRQ %d to %d\n, irq, level); + +if (likely(vgic_initialized(kvm))) { +if (spi kvm-arch.vgic.nr_irqs) +return -EINVAL; +return kvm_vgic_inject_irq(kvm, 0, spi, level); +} else if (level kvm_irq_has_notifier(kvm, 0, irq)) { +/* + * any IRQ injected before vgic readiness is + * ignored and the notifier, if any, is called + * immediately as if the virtual IRQ were completed + * by the guest + */ +kvm_notify_acked_irq(kvm, 0, irq); +return -EAGAIN; This looks fishy, I don't fully understand which case you're catering towards here (the comment is hard to understand), but my gut feeling is that you should back out (probably with an error) if the vgic is not initialized when calling this function. Setting up the routing in the first place should probably error out if the vgic is not available. The issue is related to integration with QEMU and VFIO. Currently VFIO signaling (we tell VFIO to signal the eventfd on a peculiar physical IRQ) and irqfd setup (we tell KVM/IRQFD to handle previous eventfd when triggered and inject a GSI) are done by QEMU *before* the first vcpu run. so before VGIC is ready. Both are done in a so called QEMU machine init done notifier. It could be done in a QEMU reset notifier but I guess the problem would be the same. and I think the same at which QEMU initializes it is correct. As soon as both VFIO signaling and irqfd are setup, virtual IRQ are likely to be injected and this is what happens on some circumstances. This happens on the 2d QEMU run after killing the 1st QEMU session. For some reason I guess the HW device - in my case the xgmac - was released in such a way the IRQ wire was not reset. So as soon as VFIO driver calls request_irq, the IRQ hits. I tried to change that this xgmac driver behavior but I must acknowledge that for the time beeing I failed. I will continue investigating that. Indeed I might be fixing the issue at the wrong place. But anyway this relies on the fact the assigned device driver toggled down the IRQ properly when releasing. At the time the signaling is setup we have not entered yet any driver reset code. I see, it's because of the quirky way we initialize the vgic at time of running the first CPU, so user space can't really hook into the sweet spot after initializing the vgic but just before actually running guest code. Yes currently irqfd generic code has no way to test if the virtual interrupt controller is ready (KVM__IRQFD - kvm_irqfd_assign ) because I guess other archs don't have the problem. Could it be that we simply need to let user space init the vgic after creating all its vcpus and only then allow setting up the irqfd? I start VFIO signaling and setup KVM_IRQFD in a QEMU machine init done notifier. the vgic init then must happen before then. have all the vcpus that QEMU wants to create, been created by then? Hi Christoffer, My understanding of QEMU start sequence is as follows: at machine init, CPU realize function is called for each CPU (target-arm/cpu.c arm_cpu_realizefn) qemu_init_vcpu (cpus.c) qemu_kvm_start_vcpu (cpus.c) launch VCPU thread (qemu_kvm_cpu_thread_fn) kvm_init_vcpu kvm_init_vcpu does the while (1) loop issuing KVM_RUN ioctl if *cpu_can_run* cpu_can_run returns true when vm_start is called (resume_all_vcpuscpu_resume) QEMU machine init done or reset callbacks happen after machine init and before vm_start. The actual vgic readiness is set in the first vcpu run, ie on the vm_start. vgic instantiation in virt machine file is done after cpu_instantiation. We could force vgic init at that time, in the gic realize fn (kvm_arm_gic_realize)?. All vcpus are instantiated: nr_vcpus and nr_irqs are
Re: [PATCH] target-i386: add feature flags for CPUID[EAX=0xd,ECX=1]
On Wed, Nov 26, 2014 at 10:20:12AM +0100, Paolo Bonzini wrote: On 25/11/2014 21:02, Paolo Bonzini wrote: +static const char *cpuid_xsave_feature_name[] = { +xsaveopt, xsavec, xgetbv1, xsaves, None of the above features introduce any new state that might need to be migrated, or will require other changes in QEMU to work, right? It looks like they don't introduce any extra state, but if they do, they need to be added to unmigratable_flags until migration support is implemented. If they require other QEMU changes, it would be nice if KVM reported them using KVM_CHECK_EXTENSION instead of GET_SUPPORTED_CPUID, so it wouldn't break -cpu host. No, they don't. Actually, xsaves does but I don't think KVM_CHECK_EXTENSION is right. It's just another MSR, and we haven't used KVM_CHECK_EXTENSION for new MSRs and new XSAVE areas (last example: avx512). If the changes needed are only to support migration (this is the case if it's just another MSR handled by KVM, or additional XSAVE areas), GET_SUPPORTED_CPUID is still reasonable, because features that are unknown to QEMU are always considered unmigratable until we add the feature name to the feature_name arrays. (That's why we need to know if the feature introduces additional state when adding the feature names to the array.) If other changes are required to make the feature work even if no migration is required, then adding them to GET_SUPPORTED_CPUID would break -cpu host on older QEMUs. I don't think that's the case here, but I wanted to confirm. (Should we add those observations to Documentation/virtual/kvm/api.txt?) Since no hardware really exists for it, and KVM does not support it anyway, I think it's simplest to leave xsaves out for now. Is this right? If unsure, it won't hurt to add the feature to unmigratable_features by now. Making QEMU aware of the feature name is still useful. -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CFT PATCH v2 2/2] KVM: x86: support XSAVES usage in the host
2014-11-24 17:43+0100, Paolo Bonzini: Userspace is expecting non-compacted format for KVM_GET_XSAVE, but struct xsave_struct might be using the compacted format. Convert in order to preserve userspace ABI. Likewise, userspace is passing non-compacted format for KVM_SET_XSAVE but the kernel will pass it to XRSTORS, and we need to convert back. Future instructions might force us to calling xsave/xrstor directly, so we could do that even now and save the explicit conversion ... What I mean is: we could be using the native xsave.*/xrstor.* while in kernel and use xsave/xrstor for communication with userspace. Hardware would take care of everything in the conversion. get_xsave = native_xrstor(guest_xsave); xsave(aligned_userspace_buffer) set_xsave = xrstor(aligned_userspace_buffer); native_xsave(guest_xsave) Could that work? Fixes: f31a9f7c71691569359fa7fb8b0acaa44bce0324 Cc: Fenghua Yu fenghua...@intel.com Cc: H. Peter Anvin h...@linux.intel.com Cc: Nadav Amit na...@cs.technion.ac.il Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- arch/x86/kvm/x86.c | 87 +- 1 file changed, 80 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 08b5657e57ed..373b0ab9a32e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3132,15 +3132,89 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu, return 0; } +#define XSTATE_COMPACTION_ENABLED (1ULL 63) (arch/x86/include/asm/xsave.h) + +static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu) +{ + struct xsave_struct *xsave = vcpu-arch.guest_fpu.state-xsave; + u64 xstate_bv = vcpu-arch.guest_supported_xcr0 | XSTATE_FPSSE; (I don't think this is necessary. We haven't modified it before and userspace worked, so we can save explicit copying of initialized data.) + u64 valid; + + /* + * Copy legacy XSAVE area, to avoid complications with CPUID + * leaves 0 and 1 in the loop below. + */ + memcpy(dest, xsave, XSAVE_HDR_OFFSET); (Yeah, there is an exception for SSE; I don't see any effect it has on restore though, so we could probably ignore it as well.) + + /* Set XSTATE_BV */ + *(u64 *)(dest + XSAVE_HDR_OFFSET) = xstate_bv; + + /* + * Copy each region from the possibly compacted offset to the + * non-compacted offset. + */ + valid = xstate_bv ~XSTATE_FPSSE; (We could read xstate_bv from xsave and it with supported.) + while (valid) { + u64 feature = valid -valid; + int index = fls64(feature) - 1; + void *src = get_xsave_addr(xsave, feature); (xcomp_bv never changes, so it works for compacted xsave.) + + if (src) { + u32 size, offset, ecx, edx; + cpuid_count(XSTATE_CPUID, index, + size, offset, ecx, edx); (ok, setup_xstate_features() has the same code.) + memcpy(dest + offset, src, size); + } + + valid -= feature; + } +} + +static void load_xsave(struct kvm_vcpu *vcpu, u8 *src) +{ + struct xsave_struct *xsave = vcpu-arch.guest_fpu.state-xsave; + u64 xstate_bv = *(u64 *)(src + XSAVE_HDR_OFFSET); + u64 valid; + + /* + * Copy legacy XSAVE area, to avoid complications with CPUID + * leaves 0 and 1 in the loop below. + */ + memcpy(xsave, src, XSAVE_HDR_OFFSET); + + /* Set XSTATE_BV and possibly XCOMP_BV. */ + xsave-xsave_hdr.xstate_bv = xstate_bv; + if (cpu_has_xsaves) + xsave-xsave_hdr.xcomp_bv = host_xcr0 | XSTATE_COMPACTION_ENABLED; Userspace can trigger a #GP if it passes xstate_bv bit that isn't in xcomp_bv, so we could them back into xstate_bv as well. (Linux probably won't start using IA32_XSS, so using just xcr0 is fine.) + + /* + * Copy each region from the non-compacted offset to the + * possibly compacted offset. + */ + valid = xstate_bv ~XSTATE_FPSSE; + while (valid) { + u64 feature = valid -valid; + int index = fls64(feature) - 1; + void *dest = get_xsave_addr(xsave, feature); + + if (dest) { + u32 size, offset, ecx, edx; + cpuid_count(XSTATE_CPUID, index, + size, offset, ecx, edx); + memcpy(dest, src + offset, size); + } else + WARN_ON_ONCE(1); + + valid -= feature; + } +} + static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu, struct kvm_xsave *guest_xsave) { if (cpu_has_xsave) { - memcpy(guest_xsave-region, - vcpu-arch.guest_fpu.state-xsave, - vcpu-arch.guest_xstate_size); - *(u64
Re: Exposing host debug capabilities to userspace
Paolo Bonzini pbonz...@redhat.com writes: On 25/11/2014 17:35, Alexander Graf wrote: Unfortunately on ARM you also have a few other constraints - the debug register space is partitioned into magic super debug registers at the top (with an implementation specific amount) and normal debug registers in the lower end of the space. Does the gdbstub ever need to use the magic super debug registers? Even if it does, the logic is the same as x86. On x86 you might run out of breakpoints, on ARM you might run out of breakpoints in general or magic super breakpoints in particular. By default gdbstub uses normal software breakpoints unless a) the user asks for a HW one or b) the memory is RO (i.e a ROM). For watchpoints the situation is reversed and they are used by default if available. I would just treat gdbstub debugging as the ugly step child that it is and not introduce anything special for it (except for debug event deflection to user space). Then only ever work on guest debug registers and call it a day. Chances are just too high that we get the interfaces wrong and shoot ourselves in the foot. I agree. But we do need a way to retrieve the number of valid fields in struct kvm_guest_debug_arch, if that is not architecturally defined (e.g. on x86 it's just always 4). It's IMPDEF (implementation defined) up to a maximum of 16. A hidden ONE_REG, whose existence is determined by (ARM || ARM64) kvm_check_extension(KVM_CAP_SET_GUEST_DEBUG), is certainly fine and I like it because it doesn't pollute the global KVM_CAP_* namespace. My original implementation more or less did that but my main worry is migration code tends to iterate over the whole ONE_REG list and introducing a value that doesn't happen to be strictly the guests is semantically impure. Of course if you mean by hidden don't export it via GET_REG_LIST then I guess that would work. Does seem a little ugly as the internal KVM code now needs to learn about hidden and non-hidden registers. The alternative is a capability; however, if you start with two capabilities you'll end up needing four, and then realize that returning a struct via ONE_REG would have been a better idea. :) We already have how many breakpoints, how many watchpoints, how many magic super debug registers,... Only two, not sure what the super debug registers are? Certainly on the QEMU side the capability based approach is beautifully simple: max_hw_wp = kvm_check_extension(cs-kvm_state, KVM_CAP_GUEST_DEBUG_HW_WPS); max_hw_bp = kvm_check_extension(cs-kvm_state, KVM_CAP_GUEST_DEBUG_HW_BPS); thanks to kvm_check_extension zeroing failure modes. And in it's defence it's a generic enough capability to be used across any other architectures that need to expose this information compared to the many PPC specific capabilities. Perhaps a KVM_ARCH_EXTENSION ioctl would be more useful in reducing the growth of the space? Paolo -- Alex Bennée -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exposing host debug capabilities to userspace
On 25 November 2014 at 16:50, Paolo Bonzini pbonz...@redhat.com wrote: On 25/11/2014 17:35, Alexander Graf wrote: Unfortunately on ARM you also have a few other constraints - the debug register space is partitioned into magic super debug registers at the top (with an implementation specific amount) and normal debug registers in the lower end of the space. Does the gdbstub ever need to use the magic super debug registers? No. The extra features of the trailing registers in the range are context matching, which means you can set them up as breakpoint on this context ID or breakpoint on this VMID [read: breakpoints specific to a particular process ID or particular VM], generally as linked breakpoints adding a constraint to a plain address-match breakpoint register. The gdbstub doesn't use context bps and I don't think Linux does either (though of course it might in future). -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another Obsolete Fix me in trace.h?
2014-11-24 22:49+0100, Radim Krčmář: 2014-11-24 16:19-0500, Steven Rostedt: That wouldn't be too hard to implement. I'll look at the patch tommorrow. The kernel part is trivial. Most of the code is going to be in tools/lib/traceevent/event-parse.c. I wasn't sure whether to 1) define new 'enum print_arg_type' for our function and do exactly what other __print* did 2) extend existing PRINT_FUNC with variable arguments and register it as a global event handler as (2) makes more sense to me, but we already had it when some __print* was implemented ... (and I didn't want to dig too deep into the project) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-i386: add feature flags for CPUID[EAX=0xd,ECX=1]
On 26/11/2014 12:40, Eduardo Habkost wrote: On Wed, Nov 26, 2014 at 10:20:12AM +0100, Paolo Bonzini wrote: On 25/11/2014 21:02, Paolo Bonzini wrote: +static const char *cpuid_xsave_feature_name[] = { +xsaveopt, xsavec, xgetbv1, xsaves, None of the above features introduce any new state that might need to be migrated, or will require other changes in QEMU to work, right? It looks like they don't introduce any extra state, but if they do, they need to be added to unmigratable_flags until migration support is implemented. If they require other QEMU changes, it would be nice if KVM reported them using KVM_CHECK_EXTENSION instead of GET_SUPPORTED_CPUID, so it wouldn't break -cpu host. No, they don't. Actually, xsaves does but I don't think KVM_CHECK_EXTENSION is right. It's just another MSR, and we haven't used KVM_CHECK_EXTENSION for new MSRs and new XSAVE areas (last example: avx512). If the changes needed are only to support migration (this is the case if it's just another MSR handled by KVM, or additional XSAVE areas), GET_SUPPORTED_CPUID is still reasonable, because features that are unknown to QEMU are always considered unmigratable until we add the feature name to the feature_name arrays. (That's why we need to know if the feature introduces additional state when adding the feature names to the array.) If other changes are required to make the feature work even if no migration is required, then adding them to GET_SUPPORTED_CPUID would break -cpu host on older QEMUs. I don't think that's the case here, but I wanted to confirm. KVM may need more changes (I don't know, the details of the feature are not public yet), but a new userspace API is very unlikely based on Intel documentation. (Should we add those observations to Documentation/virtual/kvm/api.txt?) Since no hardware really exists for it, and KVM does not support it anyway, I think it's simplest to leave xsaves out for now. Is this right? If unsure, it won't hurt to add the feature to unmigratable_features by now. Making QEMU aware of the feature name is still useful. Ok, thanks. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CFT PATCH v2 0/2] KVM: support XSAVES usage in the host
On 26/11/2014 02:24, Wanpeng Li wrote: Hi all, On Tue, Nov 25, 2014 at 04:50:06PM +0200, Nadav Amit wrote: On Nov 25, 2014, at 16:17, Paolo Bonzini pbonz...@redhat.com wrote: On 25/11/2014 15:05, Nadav Amit wrote: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 373b0ab9a32e..ca26681455c2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6955,6 +6955,9 @@ int fx_init(struct kvm_vcpu *vcpu) return err; fpu_finit(vcpu-arch.guest_fpu); + if (cpu_has_xsaves) + vcpu-arch.guest_fpu.state-xsave.xsave_hdr.xcomp_bv = + host_xcr0 | XSTATE_COMPACTION_ENABLED; /* * Ensure guest xcr0 is valid for loading The second version works for me (w/qemu v2.1.0; Linux 3.13 guest). I Could you try 3.17 guest which has xsaves enabled? Because I'm not sure if the below codes from Paolo is enough to mask XSAVES, should we also add F(XSAVES)? + const u32 kvm_supported_word10_x86_features = + F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1); + In addition, the 3.17 guest is still hang as I mentioned even if I add the F(XSAVES) to the kvm_supported_word10_x86_features. Note that the point is _hiding_ XSAVES while leaving XSAVEOPT/XSAVEC/XGETBV1 enabled. This series only fixes the problems with xsaves in the host, it doesn't try making XSAVES work in the guest. I have another patch which enable xsaves in KVM and the patch is still under debug with Paolo's patch KVM: x86: support XSAVES usage in the host, so the 1/2 patch from Paolo can be dropped if my patch is ready. Anyway, a quick fix is needed before enable xsaves in kvm. Please add XSAVES to kvm_supported_word10_x86_features in your patches instead. We still need to mask any (not yet defined) feature in bit 4 and above of CPUID[EAX=0xd,ECX].EAX. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/3] KVM: arm: add irqfd support
On 11/26/2014 12:31 PM, Christoffer Dall wrote: On Tue, Nov 25, 2014 at 02:12:31PM +0100, Eric Auger wrote: On 11/25/2014 11:19 AM, Christoffer Dall wrote: [Re-adding cc list] On Mon, Nov 24, 2014 at 05:42:30PM +0100, Eric Auger wrote: On 11/24/2014 04:47 PM, Christoffer Dall wrote: On Mon, Nov 24, 2014 at 12:02 PM, Eric Auger eric.au...@linaro.org wrote: On 11/24/2014 11:00 AM, Christoffer Dall wrote: On Sun, Nov 23, 2014 at 06:56:59PM +0100, Eric Auger wrote: This patch enables irqfd on arm. Both irqfd and resamplefd are supported. Injection is implemented in vgic.c without routing. This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD. KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set. Signed-off-by: Eric Auger eric.au...@linaro.org --- [...] +int kvm_set_irq(struct kvm *kvm, int irq_source_id, +u32 irq, int level, bool line_status) +{ +unsigned int spi = irq + VGIC_NR_PRIVATE_IRQS; + +kvm_debug(irqfd sets vIRQ %d to %d\n, irq, level); + +if (likely(vgic_initialized(kvm))) { +if (spi kvm-arch.vgic.nr_irqs) +return -EINVAL; +return kvm_vgic_inject_irq(kvm, 0, spi, level); +} else if (level kvm_irq_has_notifier(kvm, 0, irq)) { +/* + * any IRQ injected before vgic readiness is + * ignored and the notifier, if any, is called + * immediately as if the virtual IRQ were completed + * by the guest + */ +kvm_notify_acked_irq(kvm, 0, irq); +return -EAGAIN; This looks fishy, I don't fully understand which case you're catering towards here (the comment is hard to understand), but my gut feeling is that you should back out (probably with an error) if the vgic is not initialized when calling this function. Setting up the routing in the first place should probably error out if the vgic is not available. The issue is related to integration with QEMU and VFIO. Currently VFIO signaling (we tell VFIO to signal the eventfd on a peculiar physical IRQ) and irqfd setup (we tell KVM/IRQFD to handle previous eventfd when triggered and inject a GSI) are done by QEMU *before* the first vcpu run. so before VGIC is ready. Both are done in a so called QEMU machine init done notifier. It could be done in a QEMU reset notifier but I guess the problem would be the same. and I think the same at which QEMU initializes it is correct. As soon as both VFIO signaling and irqfd are setup, virtual IRQ are likely to be injected and this is what happens on some circumstances. This happens on the 2d QEMU run after killing the 1st QEMU session. For some reason I guess the HW device - in my case the xgmac - was released in such a way the IRQ wire was not reset. So as soon as VFIO driver calls request_irq, the IRQ hits. I tried to change that this xgmac driver behavior but I must acknowledge that for the time beeing I failed. I will continue investigating that. Indeed I might be fixing the issue at the wrong place. But anyway this relies on the fact the assigned device driver toggled down the IRQ properly when releasing. At the time the signaling is setup we have not entered yet any driver reset code. I see, it's because of the quirky way we initialize the vgic at time of running the first CPU, so user space can't really hook into the sweet spot after initializing the vgic but just before actually running guest code. Yes currently irqfd generic code has no way to test if the virtual interrupt controller is ready (KVM__IRQFD - kvm_irqfd_assign ) because I guess other archs don't have the problem. Could it be that we simply need to let user space init the vgic after creating all its vcpus and only then allow setting up the irqfd? I start VFIO signaling and setup KVM_IRQFD in a QEMU machine init done notifier. the vgic init then must happen before then. have all the vcpus that QEMU wants to create, been created by then? Hi Christoffer, My understanding of QEMU start sequence is as follows: at machine init, CPU realize function is called for each CPU (target-arm/cpu.c arm_cpu_realizefn) qemu_init_vcpu (cpus.c) qemu_kvm_start_vcpu (cpus.c) launch VCPU thread (qemu_kvm_cpu_thread_fn) kvm_init_vcpu kvm_init_vcpu does the while (1) loop issuing KVM_RUN ioctl if *cpu_can_run* cpu_can_run returns true when vm_start is called (resume_all_vcpuscpu_resume) QEMU machine init done or reset callbacks happen after machine init and before vm_start. The actual vgic readiness is set in the first vcpu run, ie on the vm_start. vgic instantiation in virt machine file is done after cpu_instantiation. We could force vgic init at that time, in the gic realize fn (kvm_arm_gic_realize)?. All vcpus are instantiated: nr_vcpus and nr_irqs are known. Also base addresses are known. Currently
Re: [PATCH 2/7] KVM: arm: guest debug, define API headers
Paolo Bonzini pbonz...@redhat.com writes: On 25/11/2014 18:13, Peter Maydell wrote: On 25 November 2014 at 17:05, Paolo Bonzini pbonz...@redhat.com wrote: So there is no register that says this breakpoint has triggered or this watchpoint has triggered? Nope. You take a debug exception; the syndrome register tells you if it was a bp or a wp, and if it was a wp the fault address register tells you the address being accessed (if it was a bp you know the program counter, obviously). The debugger is expected to be able to figure it out from there, if it cares. That's already good enough---do the KVM_DEBUG_EXIT_* constants match the syndrome register, or if not why? No they don't. I did consider it at the time but I was wary of pulling too much over into the uapi headers wholesale. If your happy to do that I'll include the change in my next version. I could also rationalise the exit handlers as they all pretty much do the same thing (save for the exit/syndrome related info). Again I was keeping things nicely separated in case any particular exception needed excessive special case handling. Would you like those changes? Paolo -- Alex Bennée -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CFT PATCH v2 2/2] KVM: x86: support XSAVES usage in the host
On 26/11/2014 13:07, Radim Krčmář wrote: 2014-11-24 17:43+0100, Paolo Bonzini: Userspace is expecting non-compacted format for KVM_GET_XSAVE, but struct xsave_struct might be using the compacted format. Convert in order to preserve userspace ABI. Likewise, userspace is passing non-compacted format for KVM_SET_XSAVE but the kernel will pass it to XRSTORS, and we need to convert back. Future instructions might force us to calling xsave/xrstor directly, so we could do that even now and save the explicit conversion ... What I mean is: we could be using the native xsave.*/xrstor.* while in kernel and use xsave/xrstor for communication with userspace. Hardware would take care of everything in the conversion. get_xsave = native_xrstor(guest_xsave); xsave(aligned_userspace_buffer) set_xsave = xrstor(aligned_userspace_buffer); native_xsave(guest_xsave) Could that work? It could, though it is more like get_fpu() native_xrstor(guest_xsave) xsave(buffer) put_fpu() and vice versa. Also, the userspace buffer is mos likely not aligned, so you need some kind of bounce buffer. It can be done if the CPUID turns out to be a bottleneck, apart from that it'd most likely be slower. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] KVM: arm: guest debug, define API headers
On 26/11/2014 14:13, Alex Bennée wrote: Paolo Bonzini pbonz...@redhat.com writes: On 25/11/2014 18:13, Peter Maydell wrote: On 25 November 2014 at 17:05, Paolo Bonzini pbonz...@redhat.com wrote: So there is no register that says this breakpoint has triggered or this watchpoint has triggered? Nope. You take a debug exception; the syndrome register tells you if it was a bp or a wp, and if it was a wp the fault address register tells you the address being accessed (if it was a bp you know the program counter, obviously). The debugger is expected to be able to figure it out from there, if it cares. That's already good enough---do the KVM_DEBUG_EXIT_* constants match the syndrome register, or if not why? No they don't. I did consider it at the time but I was wary of pulling too much over into the uapi headers wholesale. If your happy to do that I'll include the change in my next version. I could also rationalise the exit handlers as they all pretty much do the same thing (save for the exit/syndrome related info). Again I was keeping things nicely separated in case any particular exception needed excessive special case handling. Would you like those changes? Yes, please. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: Generate #UD when memory operand is required
Certain x86 instructions that use modrm operands only allow memory operand (i.e., mod012), and cause a #UD exception otherwise. KVM ignores this fact. Currently, the instructions that are such and are emulated by KVM are MOVBE, MOVNTPS, MOVNTPD and MOVNTI. MOVBE is the most blunt example, since it may be emulated by the host regardless of MMIO. The fix introduces a new group for handling such instructions, marking mod3 as illegal instruction. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/emulate.c | 38 ++ 1 file changed, 34 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 6a57157..3817334 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -123,6 +123,7 @@ #define Prefix (315) /* Instruction varies with 66/f2/f3 prefix */ #define RMExt (415) /* Opcode extension in ModRM r/m if mod == 3 */ #define Escape (515) /* Escape to coprocessor instruction */ +#define InstrDual (615) /* Alternate instruction decoding of mod == 3 */ #define Sse (118) /* SSE Vector instruction */ /* Generic ModRM decode. */ #define ModRM (119) @@ -211,6 +212,7 @@ struct opcode { const struct group_dual *gdual; const struct gprefix *gprefix; const struct escape *esc; + const struct instr_dual *idual; void (*fastop)(struct fastop *fake); } u; int (*check_perm)(struct x86_emulate_ctxt *ctxt); @@ -233,6 +235,11 @@ struct escape { struct opcode high[64]; }; +struct instr_dual { + struct opcode mod012; + struct opcode mod3; +}; + /* EFLAGS bit definitions. */ #define EFLG_ID (121) #define EFLG_VIP (120) @@ -3679,6 +3686,7 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt) #define EXT(_f, _e) { .flags = ((_f) | RMExt), .u.group = (_e) } #define G(_f, _g) { .flags = ((_f) | Group | ModRM), .u.group = (_g) } #define GD(_f, _g) { .flags = ((_f) | GroupDual | ModRM), .u.gdual = (_g) } +#define ID(_f, _i) { .flags = ((_f) | InstrDual | ModRM), .u.idual = (_i) } #define E(_f, _e) { .flags = ((_f) | Escape | ModRM), .u.esc = (_e) } #define I(_f, _e) { .flags = (_f), .u.execute = (_e) } #define F(_f, _e) { .flags = (_f) | Fastop, .u.fastop = (_e) } @@ -3840,8 +3848,12 @@ static const struct gprefix pfx_0f_6f_0f_7f = { I(Mmx, em_mov), I(Sse | Aligned, em_mov), N, I(Sse | Unaligned, em_mov), }; +static const struct instr_dual instr_dual_0f_2b = { + I(0, em_mov), N +}; + static const struct gprefix pfx_0f_2b = { - I(0, em_mov), I(0, em_mov), N, N, + ID(0, instr_dual_0f_2b), ID(0, instr_dual_0f_2b), N, N, }; static const struct gprefix pfx_0f_28_0f_29 = { @@ -3915,6 +3927,10 @@ static const struct escape escape_dd = { { N, N, N, N, N, N, N, N, } }; +static const struct instr_dual instr_dual_0f_c3 = { + I(DstMem | SrcReg | ModRM | No16 | Mov, em_mov), N +}; + static const struct opcode opcode_table[256] = { /* 0x00 - 0x07 */ F6ALU(Lock, em_add), @@ -4117,7 +4133,7 @@ static const struct opcode twobyte_table[256] = { D(DstReg | SrcMem8 | ModRM | Mov), D(DstReg | SrcMem16 | ModRM | Mov), /* 0xC0 - 0xC7 */ F2bv(DstMem | SrcReg | ModRM | SrcWrite | Lock, em_xadd), - N, I(DstMem | SrcReg | ModRM | No16 | Mov, em_mov), + N, ID(0, instr_dual_0f_c3), N, N, N, GD(0, group9), /* 0xC8 - 0xCF */ X8(I(DstReg, em_bswap)), @@ -4130,12 +4146,20 @@ static const struct opcode twobyte_table[256] = { N, N, N, N, N, N, N, N, N, N, N, N, N, N, N, N }; +static const struct instr_dual instr_dual_0f_38_f0 = { + I(DstReg | SrcMem | Mov, em_movbe), N +}; + +static const struct instr_dual instr_dual_0f_38_f1 = { + I(DstMem | SrcReg | Mov, em_movbe), N +}; + static const struct gprefix three_byte_0f_38_f0 = { - I(DstReg | SrcMem | Mov, em_movbe), N, N, N + ID(0, instr_dual_0f_38_f0), N, N, N }; static const struct gprefix three_byte_0f_38_f1 = { - I(DstMem | SrcReg | Mov, em_movbe), N, N, N + ID(0, instr_dual_0f_38_f1), N, N, N }; /* @@ -4536,6 +4560,12 @@ done_prefixes: else opcode = opcode.u.esc-op[(ctxt-modrm 3) 7]; break; + case InstrDual: + if ((ctxt-modrm 6) == 3) + opcode = opcode.u.idual-mod3; + else + opcode = opcode.u.idual-mod012; + break; default: return EMULATION_FAILED; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests] x86: Test illegal movbe
Previously KVM ignored the mod field of MOVBE instruction, so MOVBE from register to register succeeds, although it should fail (cause a #UD exception). This test check that a #UD is indeed delivered upon such MOVBE. The test would not work if MOVBE is unsupported. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- x86/emulator.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 5aa4dbf..709978b 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -1051,6 +1051,27 @@ static void test_simplealu(u32 *mem) report(test, *mem == 0x8400); } +static void illegal_movbe_handler(struct ex_regs *regs) +{ + extern char bad_movbe_cont; + + ++exceptions; + regs-rip = (ulong)bad_movbe_cont; +} + +static void test_illegal_movbe(void) +{ + if (!(cpuid(1).c (1 22))) + printf(SKIP: illegal movbe\n); + + exceptions = 0; + handle_exception(UD_VECTOR, illegal_movbe_handler); + asm volatile(.byte 0x0f; .byte 0x38; .byte 0xf0; .byte 0xc0;\n\t + bad_movbe_cont: : : : rax); + report(illegal movbe, exceptions == 1); + handle_exception(UD_VECTOR, 0); +} + int main() { void *mem; @@ -1119,6 +1140,7 @@ int main() test_string_io_mmio(mem); test_jmp_noncanonical(mem); + test_illegal_movbe(); return report_summary(); } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Generate #UD when memory operand is required
On 26/11/2014 14:47, Nadav Amit wrote: Certain x86 instructions that use modrm operands only allow memory operand (i.e., mod012), and cause a #UD exception otherwise. KVM ignores this fact. Currently, the instructions that are such and are emulated by KVM are MOVBE, MOVNTPS, MOVNTPD and MOVNTI. MOVBE is the most blunt example, since it may be emulated by the host regardless of MMIO. The fix introduces a new group for handling such instructions, marking mod3 as illegal instruction. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/emulate.c | 38 ++ 1 file changed, 34 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 6a57157..3817334 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -123,6 +123,7 @@ #define Prefix (315) /* Instruction varies with 66/f2/f3 prefix */ #define RMExt (415) /* Opcode extension in ModRM r/m if mod == 3 */ #define Escape (515) /* Escape to coprocessor instruction */ +#define InstrDual (615) /* Alternate instruction decoding of mod == 3 */ #define Sse (118) /* SSE Vector instruction */ /* Generic ModRM decode. */ #define ModRM (119) @@ -211,6 +212,7 @@ struct opcode { const struct group_dual *gdual; const struct gprefix *gprefix; const struct escape *esc; + const struct instr_dual *idual; void (*fastop)(struct fastop *fake); } u; int (*check_perm)(struct x86_emulate_ctxt *ctxt); @@ -233,6 +235,11 @@ struct escape { struct opcode high[64]; }; +struct instr_dual { + struct opcode mod012; + struct opcode mod3; +}; + /* EFLAGS bit definitions. */ #define EFLG_ID (121) #define EFLG_VIP (120) @@ -3679,6 +3686,7 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt) #define EXT(_f, _e) { .flags = ((_f) | RMExt), .u.group = (_e) } #define G(_f, _g) { .flags = ((_f) | Group | ModRM), .u.group = (_g) } #define GD(_f, _g) { .flags = ((_f) | GroupDual | ModRM), .u.gdual = (_g) } +#define ID(_f, _i) { .flags = ((_f) | InstrDual | ModRM), .u.idual = (_i) } #define E(_f, _e) { .flags = ((_f) | Escape | ModRM), .u.esc = (_e) } #define I(_f, _e) { .flags = (_f), .u.execute = (_e) } #define F(_f, _e) { .flags = (_f) | Fastop, .u.fastop = (_e) } @@ -3840,8 +3848,12 @@ static const struct gprefix pfx_0f_6f_0f_7f = { I(Mmx, em_mov), I(Sse | Aligned, em_mov), N, I(Sse | Unaligned, em_mov), }; +static const struct instr_dual instr_dual_0f_2b = { + I(0, em_mov), N +}; + static const struct gprefix pfx_0f_2b = { - I(0, em_mov), I(0, em_mov), N, N, + ID(0, instr_dual_0f_2b), ID(0, instr_dual_0f_2b), N, N, }; static const struct gprefix pfx_0f_28_0f_29 = { @@ -3915,6 +3927,10 @@ static const struct escape escape_dd = { { N, N, N, N, N, N, N, N, } }; +static const struct instr_dual instr_dual_0f_c3 = { + I(DstMem | SrcReg | ModRM | No16 | Mov, em_mov), N +}; + static const struct opcode opcode_table[256] = { /* 0x00 - 0x07 */ F6ALU(Lock, em_add), @@ -4117,7 +4133,7 @@ static const struct opcode twobyte_table[256] = { D(DstReg | SrcMem8 | ModRM | Mov), D(DstReg | SrcMem16 | ModRM | Mov), /* 0xC0 - 0xC7 */ F2bv(DstMem | SrcReg | ModRM | SrcWrite | Lock, em_xadd), - N, I(DstMem | SrcReg | ModRM | No16 | Mov, em_mov), + N, ID(0, instr_dual_0f_c3), N, N, N, GD(0, group9), /* 0xC8 - 0xCF */ X8(I(DstReg, em_bswap)), @@ -4130,12 +4146,20 @@ static const struct opcode twobyte_table[256] = { N, N, N, N, N, N, N, N, N, N, N, N, N, N, N, N }; +static const struct instr_dual instr_dual_0f_38_f0 = { + I(DstReg | SrcMem | Mov, em_movbe), N +}; + +static const struct instr_dual instr_dual_0f_38_f1 = { + I(DstMem | SrcReg | Mov, em_movbe), N +}; + static const struct gprefix three_byte_0f_38_f0 = { - I(DstReg | SrcMem | Mov, em_movbe), N, N, N + ID(0, instr_dual_0f_38_f0), N, N, N }; static const struct gprefix three_byte_0f_38_f1 = { - I(DstMem | SrcReg | Mov, em_movbe), N, N, N + ID(0, instr_dual_0f_38_f1), N, N, N }; /* @@ -4536,6 +4560,12 @@ done_prefixes: else opcode = opcode.u.esc-op[(ctxt-modrm 3) 7]; break; + case InstrDual: + if ((ctxt-modrm 6) == 3) + opcode = opcode.u.idual-mod3; + else + opcode = opcode.u.idual-mod012; + break; default: return EMULATION_FAILED; } Thanks, looks good (not applied yet). Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to
[GIT PULL] KVM changes for 3.18-rc7 (or final)
Linus, The following changes since commit b914c5b2130239fd378d1a719ab3c53f0c782be9: Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux (2014-11-25 19:05:41 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus for you to fetch changes up to d3fccc7ef831d1d829b4da5eaa081db55b1e38f3: kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn() (2014-11-26 14:40:45 +0100) Last minute KVM/ARM fixes; even the generic change actually affects nothing but ARM. Ard Biesheuvel (2): arm/arm64: kvm: drop inappropriate use of kvm_is_mmio_pfn() kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn() Christoffer Dall (2): arm64: KVM: Handle traps of ICC_SRE_EL1 as RAZ/WI arm/arm64: KVM: vgic: Fix error code in kvm_vgic_create() Mark Rutland (1): arm64: KVM: fix unmapping with 48-bit VAs arch/arm/kvm/mmu.c| 10 -- arch/arm64/kvm/sys_regs.c | 9 + arch/ia64/kvm/kvm-ia64.c | 2 +- arch/x86/kvm/mmu.c| 6 +++--- include/linux/kvm_host.h | 2 +- virt/kvm/arm/vgic.c | 8 virt/kvm/kvm_main.c | 16 7 files changed, 34 insertions(+), 19 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CFT PATCH v2 2/2] KVM: x86: support XSAVES usage in the host
2014-11-26 14:13+0100, Paolo Bonzini: On 26/11/2014 13:07, Radim Krčmář wrote: 2014-11-24 17:43+0100, Paolo Bonzini: Userspace is expecting non-compacted format for KVM_GET_XSAVE, but struct xsave_struct might be using the compacted format. Convert in order to preserve userspace ABI. Likewise, userspace is passing non-compacted format for KVM_SET_XSAVE but the kernel will pass it to XRSTORS, and we need to convert back. Future instructions might force us to calling xsave/xrstor directly, so we could do that even now and save the explicit conversion ... What I mean is: we could be using the native xsave.*/xrstor.* while in kernel and use xsave/xrstor for communication with userspace. Hardware would take care of everything in the conversion. get_xsave = native_xrstor(guest_xsave); xsave(aligned_userspace_buffer) set_xsave = xrstor(aligned_userspace_buffer); native_xsave(guest_xsave) Could that work? It could, though it is more like get_fpu() native_xrstor(guest_xsave) xsave(buffer) put_fpu() and vice versa. Also, the userspace buffer is mos likely not aligned, so you need some kind of bounce buffer. It can be done if the CPUID turns out to be a bottleneck, apart from that it'd most likely be slower. Yeah, it was mostly making this code more future-proof ... it is easier to convince xsave.h to export its structures if CPUID is the problem. (I still see some hope for Linux, so performance isn't my primary goal.) I'm quite interested in CPUID now though, so I'll try to benchmark it, someday. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 25/42] vhost: add memory access wrappers
On Tue, 25 Nov 2014 18:43:10 +0200 Michael S. Tsirkin m...@redhat.com wrote: These wrappers are needed to handle virtio endianness conversions. ? Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/vhost.h | 33 - 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 3eda654..b9032e8 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -174,6 +174,37 @@ enum { static inline int vhost_has_feature(struct vhost_virtqueue *vq, int bit) { - return vq-acked_features (1 bit); + return vq-acked_features (1ULL bit); Should this hunk go into patch 28? +} -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests] x86: Test illegal movbe
On 26/11/2014 14:48, Nadav Amit wrote: Previously KVM ignored the mod field of MOVBE instruction, so MOVBE from register to register succeeds, although it should fail (cause a #UD exception). This test check that a #UD is indeed delivered upon such MOVBE. The test would not work if MOVBE is unsupported. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- x86/emulator.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 5aa4dbf..709978b 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -1051,6 +1051,27 @@ static void test_simplealu(u32 *mem) report(test, *mem == 0x8400); } +static void illegal_movbe_handler(struct ex_regs *regs) +{ + extern char bad_movbe_cont; + + ++exceptions; + regs-rip = (ulong)bad_movbe_cont; +} + +static void test_illegal_movbe(void) +{ + if (!(cpuid(1).c (1 22))) + printf(SKIP: illegal movbe\n); + + exceptions = 0; + handle_exception(UD_VECTOR, illegal_movbe_handler); + asm volatile(.byte 0x0f; .byte 0x38; .byte 0xf0; .byte 0xc0;\n\t + bad_movbe_cont: : : : rax); + report(illegal movbe, exceptions == 1); + handle_exception(UD_VECTOR, 0); +} + int main() { void *mem; @@ -1119,6 +1140,7 @@ int main() test_string_io_mmio(mem); test_jmp_noncanonical(mem); + test_illegal_movbe(); return report_summary(); } Reviewed-by: Paolo Bonzini pbonz...@redhat.com Thanks, Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CFT PATCH v2 2/2] KVM: x86: support XSAVES usage in the host
On 26/11/2014 14:53, Radim Krčmář wrote: get_xsave = native_xrstor(guest_xsave); xsave(aligned_userspace_buffer) set_xsave = xrstor(aligned_userspace_buffer); native_xsave(guest_xsave) Could that work? It could, though it is more like get_fpu() native_xrstor(guest_xsave) xsave(buffer) put_fpu() and vice versa. Also, the userspace buffer is mos likely not aligned, so you need some kind of bounce buffer. It can be done if the CPUID turns out to be a bottleneck, apart from that it'd most likely be slower. Yeah, it was mostly making this code more future-proof ... it is easier to convince xsave.h to export its structures if CPUID is the problem. (I still see some hope for Linux, so performance isn't my primary goal.) I'm quite interested in CPUID now though, so I'll try to benchmark it, someday. I'm not sure what is more future proof. :) I wonder if native_xrstor could be a problem the day XRSTORS actually sets/restores MSRs as the processor documentation promises. We do not need that to pass them to userspace via KVM_GET/SET_XSAVE because we have KVM_GET/SET_MSR for that, but it may cause problems if get_xsave uses XRSTORS and thus sets the MSRs to unanticipated values. Difficult to say without more information on Intel's plans. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
With '-cpu host' in L1 nested guest fails: KVM: entry failed, hardware error 0x7
Hi, This error occurs when using using '-cpu host'. And, is consistently reproducible with Fedora 21's Kernels. One way to reproduce the issue -- a) Enable nested virt on host, I use this procedure[1] b) Boot a a guest (ensure /dev/kvm character device shows up) c) Invoke `libguestfs-test-tool` utility in L1 (which will run a QEMU appliance with -cpu host). You should see the failure in this log: ~/.cache/libvirt/qemu/log/guestfs-vegb4aor6a8d1r3e.log QEMU CLI: [. . .] 2014-11-26 12:16:02.449+: starting up LC_ALL=C PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/kashyapc/.local/bin:/home/kashyapc/bin:/usr/local/sbin:/usr/sbin:/sbin HOME=/home/kashyapc USER=kashyapc LOGNAME=kashyapc QEMU_AUDIO_DRV=none TMPDIR=/var/tmp /bin/qemu-kvm -name guestfs-vegb4aor6a8d1r3e -S -machine pc-i440fx-2.1,accel=kvm,usb=off -cpu host -m 500 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 27869b93-0b83-4b57-a909-9265c677d307 -nographic -no-user-config -nodefaults -device sga -chardev socket,id=charmonitor,path=/home/kashyapc/.config/libvirt/qemu/lib/guestfs-vegb4aor6a8d1r3e.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-reboot -no-acpi -boot strict=on -kernel /var/tmp/.guestfs-1000/appliance.d/kernel -initrd /var/tmp/.guestfs-1000/appliance.d/initrd -append panic=1 console=ttyS0 udevtimeout=6000 udev.event-timeout=6000 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 TERM=screen -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/tmp/libguestfsenp1an/devnull1,if=none,id=drive-scsi0-0-0-0,format=raw,cache=writeback -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -drive file=/tmp/libguestfsenp1an/overlay2,if=none,id=drive-scsi0-0-1-0,format=qcow2,cache=unsafe -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=1,lun=0,drive=drive-scsi0-0-1-0,id=scsi0-0-1-0 -chardev socket,id=charserial0,path=/tmp/libguestfsenp1an/console.sock -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/tmp/libguestfsenp1an/guestfsd.sock -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.libguestfs.channel.0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on Domain id=2 is tainted: custom-argv Domain id=2 is tainted: host-cpu KVM: entry failed, hardware error 0x7 RAX=000f RBX= RCX=038f RDX= RSI=000f RDI=038f RBP=88001ee4faa0 RSP=88001ee4faa0 R8 = R9 =88001f00cf60 R10=88001ee4fa48 R11=0005 R12=038f R13=81809390 R14= R15=88001f00ca40 RIP=8105b8ca RFL=0046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 00c0 CS =0010 00a09b00 DPL=0 CS64 [-RA] SS = 00c0 DS = 00c0 FS = 00c0 GS = 88001f00 00c0 LDT= 00c0 TR =0040 88001f012080 2087 8b00 DPL=0 TSS64-busy GDT= 88001f00a000 007f IDT= ff56b000 0fff CR0=8005003b CR2= CR3=01c11000 CR4=07f0 DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER=0d01 Code=20 48 89 f8 48 09 f0 5d c3 90 55 89 f0 89 f9 48 89 e5 0f 30 31 c0 5d c3 66 90 55 89 f9 48 89 e5 0f 33 48 89 d7 89 c1 48 c1 e7 20 48 89 f8 48 09 c8 5d [. . .] I'll admit, with Fedora Rawhide Kernel, libvirt and QEMU versions I could not reproduce the issue: $ uname -r; rpm -q qemu-system-x86 libvirt-daemon-kvm 3.18.0-0.rc6.git0.1.fc22.x86_64 qemu-system-x86-2.2.0-0.1.rc1.fc22.x86_64 libvirt-daemon-kvm-1.2.10-3.fc22.x86_64 There's an existing bug for Fedora, here[2] [1] https://kashyapc.fedorapeople.org/virt/procedure-to-enable-nested-virt-on-intel-machines.txt [2] https://bugzilla.redhat.com/show_bug.cgi?id=1016748#c9 -- KVM: entry failed, hardware error 0x7 for nested kvm guest -- /kashyap -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 25/42] vhost: add memory access wrappers
On Wed, Nov 26, 2014 at 02:54:38PM +0100, Cornelia Huck wrote: On Tue, 25 Nov 2014 18:43:10 +0200 Michael S. Tsirkin m...@redhat.com wrote: These wrappers are needed to handle virtio endianness conversions. ? yes, it's same as virtio ones. I'll add this text, thanks. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/vhost.h | 33 - 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 3eda654..b9032e8 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -174,6 +174,37 @@ enum { static inline int vhost_has_feature(struct vhost_virtqueue *vq, int bit) { - return vq-acked_features (1 bit); + return vq-acked_features (1ULL bit); Should this hunk go into patch 28? Well, this is needed here since 1 32 is not legal C. I can move it - this means patch 28 will have to move earlier in series though. +} -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another Obsolete Fix me in trace.h?
On Wed, 26 Nov 2014 13:40:26 +0100 Radim Krčmář rkrc...@redhat.com wrote: 2014-11-24 22:49+0100, Radim Krčmář: 2014-11-24 16:19-0500, Steven Rostedt: That wouldn't be too hard to implement. I'll look at the patch tommorrow. The kernel part is trivial. Most of the code is going to be in tools/lib/traceevent/event-parse.c. I wasn't sure whether to 1) define new 'enum print_arg_type' for our function and do exactly what other __print* did 2) extend existing PRINT_FUNC with variable arguments and register it as a global event handler as (2) makes more sense to me, but we already had it when some __print* was implemented ... (and I didn't want to dig too deep into the project) Add a new hardcoded instance. The PRINT_FUNC is for plugins that have a unique function for the events they handle. Notice that event-parse does not define any func handlers. If this is to be generic, then lets make it specified as a new enum. Thanks, -- Steve -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 25/42] vhost: add memory access wrappers
On Wed, 26 Nov 2014 16:05:39 +0200 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Nov 26, 2014 at 02:54:38PM +0100, Cornelia Huck wrote: On Tue, 25 Nov 2014 18:43:10 +0200 Michael S. Tsirkin m...@redhat.com wrote: @@ -174,6 +174,37 @@ enum { static inline int vhost_has_feature(struct vhost_virtqueue *vq, int bit) { - return vq-acked_features (1 bit); + return vq-acked_features (1ULL bit); Should this hunk go into patch 28? Well, this is needed here since 1 32 is not legal C. I can move it - this means patch 28 will have to move earlier in series though. Yes, I think it makes sense to move patch 28 earlier. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] KVM: add commentary for kvm_debug_exit_arch struct
On Tue, Nov 25, 2014 at 04:09:59PM +, Alex Bennée wrote: Bring into line with the commentary for the other structures and their KVM_EXIT_* cases. Signed-off-by: Alex Bennée alex.ben...@linaro.org diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 6076882..523f476 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -226,6 +226,7 @@ struct kvm_run { __u32 count; __u64 data_offset; /* relative to kvm_run start */ } io; + /* KVM_EXIT_DEBUG */ struct { struct kvm_debug_exit_arch arch; } debug; -- 2.1.3 Can we change this to a 'KVM: add commentary for kvm exit type data' patch? We're also missing /* KVM_EXIT_INTERNAL_ERROR */ and /* KVM_EXIT_PAPR_HCALL */ ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 25/42] vhost: add memory access wrappers
On Wed, Nov 26, 2014 at 03:17:50PM +0100, Cornelia Huck wrote: On Wed, 26 Nov 2014 16:05:39 +0200 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Nov 26, 2014 at 02:54:38PM +0100, Cornelia Huck wrote: On Tue, 25 Nov 2014 18:43:10 +0200 Michael S. Tsirkin m...@redhat.com wrote: @@ -174,6 +174,37 @@ enum { static inline int vhost_has_feature(struct vhost_virtqueue *vq, int bit) { - return vq-acked_features (1 bit); + return vq-acked_features (1ULL bit); Should this hunk go into patch 28? Well, this is needed here since 1 32 is not legal C. I can move it - this means patch 28 will have to move earlier in series though. Yes, I think it makes sense to move patch 28 earlier. Will do, thanks. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] KVM: arm: guest debug, define API headers
On Tue, Nov 25, 2014 at 04:10:00PM +, Alex Bennée wrote: This commit defines the API headers for guest debugging. There are two architecture specific debug structures: - kvm_guest_debug_arch, allows us to pass in HW debug registers - kvm_debug_exit_arch, signals the exact debug exit and address The type of debugging being used is control by the architecture specific control bits of the kvm_guest_debug-control flags in the ioctl structure. Signed-off-by: Alex Bennée alex.ben...@linaro.org diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h index 8e38878..de2450c 100644 --- a/arch/arm64/include/uapi/asm/kvm.h +++ b/arch/arm64/include/uapi/asm/kvm.h @@ -93,10 +93,30 @@ struct kvm_sregs { struct kvm_fpu { }; +/* See ARM ARM D7.3: Debug Registers + * + * The control registers are stored as 64bit values as the setup code + * is shared with the normal cpu context restore code in hyp.S which + * is 64 bit only. + */ +#define KVM_ARM_NDBG_REGS 16 struct kvm_guest_debug_arch { + __u64 dbg_bcr[KVM_ARM_NDBG_REGS]; + __u64 dbg_bvr[KVM_ARM_NDBG_REGS]; + __u64 dbg_wcr[KVM_ARM_NDBG_REGS]; + __u64 dbg_wvr[KVM_ARM_NDBG_REGS]; }; +/* Exit types which define why we did a debug exit */ +#define KVM_DEBUG_EXIT_ERROR 0x0 +#define KVM_DEBUG_EXIT_SINGLE_STEP 0x1 +#define KVM_DEBUG_EXIT_SW_BKPT 0x2 +#define KVM_DEBUG_EXIT_HW_BKPT 0x3 +#define KVM_DEBUG_EXIT_HW_WTPT 0x4 + struct kvm_debug_exit_arch { + __u64 address; + __u32 exit_type; }; struct kvm_sync_regs { @@ -198,4 +218,12 @@ struct kvm_arch_memory_slot { #endif +/* Architecture related debug defines - upper 16 bits of + * kvm_guest_debug-control + */ +#define KVM_GUESTDBG_USE_SW_BP_SHIFT 16 +#define KVM_GUESTDBG_USE_SW_BP (1 KVM_GUESTDBG_USE_SW_BP_SHIFT) +#define KVM_GUESTDBG_USE_HW_BP_SHIFT 17 +#define KVM_GUESTDBG_USE_HW_BP (1 KVM_GUESTDBG_USE_HW_BP_SHIFT) + I see this are defined in arch/x86/include/uapi/asm/kvm.h, so you needed to reproduce them here, but shouldn't they be promoted to include/uapi/linux/kvm.h instead? #endif /* __ARM_KVM_H__ */ -- 2.1.3 ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 26/42] vhost/net: force len for TX to host endian
On Tue, 25 Nov 2014 18:43:14 +0200 Michael S. Tsirkin m...@redhat.com wrote: We use native endian-ness internally but never expose it to guest. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/net.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 8dae2f7..dce5c58 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -48,15 +48,15 @@ MODULE_PARM_DESC(experimental_zcopytx, Enable Zero Copy TX; * status internally; used for zerocopy tx only. */ /* Lower device DMA failed */ -#define VHOST_DMA_FAILED_LEN 3 +#define VHOST_DMA_FAILED_LEN ((__force __virtio32)3) /* Lower device DMA done */ -#define VHOST_DMA_DONE_LEN 2 +#define VHOST_DMA_DONE_LEN ((__force __virtio32)2) /* Lower device DMA in progress */ -#define VHOST_DMA_IN_PROGRESS1 +#define VHOST_DMA_IN_PROGRESS((__force __virtio32)1) /* Buffer unused */ -#define VHOST_DMA_CLEAR_LEN 0 +#define VHOST_DMA_CLEAR_LEN ((__force __virtio32)0) I find these constants a bit confusing: What does __virtio32 mean without the context of a vq or device? -#define VHOST_DMA_IS_DONE(len) ((len) = VHOST_DMA_DONE_LEN) +#define VHOST_DMA_IS_DONE(len) ((__force u32)(len) = (__force u32)VHOST_DMA_DONE_LEN) And here you cast it to a plain u32 again. I looked at the final code, and you seem either to use the above constants for .len or do a cpu_to_vhost32(). Wouldn't you need to convert the constants as well? enum { VHOST_NET_FEATURES = VHOST_FEATURES | -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/7] KVM: arm: guest debug, add stub KVM_SET_GUEST_DEBUG ioctl
On Tue, Nov 25, 2014 at 04:10:01PM +, Alex Bennée wrote: This commit adds a stub function to support the KVM_SET_GUEST_DEBUG ioctl. Currently any operation flag will return EINVAL. Actual functionality will be added with further patches. Technically the stub is already there, and you're extending it to start looking at control flags, but still not doing anything yet. Signed-off-by: Alex Bennée alex.ben...@linaro.org. diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 7610eaa..2c6386e 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2570,7 +2570,7 @@ handled. 4.87 KVM_SET_GUEST_DEBUG Capability: KVM_CAP_SET_GUEST_DEBUG -Architectures: x86, s390, ppc +Architectures: x86, s390, ppc, arm64 Type: vcpu ioctl Parameters: struct kvm_guest_debug (in) Returns: 0 on success; -1 on error diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 9e193c8..a0ff410 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -180,6 +180,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ARM_PSCI: case KVM_CAP_ARM_PSCI_0_2: case KVM_CAP_READONLY_MEM: + case KVM_CAP_SET_GUEST_DEBUG: r = 1; break; case KVM_CAP_COALESCED_MMIO: @@ -302,7 +303,34 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) { - return -EINVAL; + /* If it's not enabled clear all flags */ + if (!(dbg-control KVM_GUESTDBG_ENABLE)) { + vcpu-guest_debug = 0; + return 0; + } + + vcpu-guest_debug = dbg-control; + kvm_info(%s: guest_debug is 0x%lx\n, __func__, vcpu-guest_debug); + + /* Single Step */ + if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) { + kvm_info(SS requested, not yet implemented\n); + return -EINVAL; + } + + /* Software Break Points */ + if (vcpu-guest_debug KVM_GUESTDBG_USE_SW_BP) { + kvm_info(SW BP support requested, not yet implemented\n); + return -EINVAL; + } + + /* Hardware assisted Break and Watch points */ + if (vcpu-guest_debug KVM_GUESTDBG_USE_HW_BP) { + kvm_info(HW BP support requested, not yet implemented\n); + return -EINVAL; + } + + return 0; } I guess all the kvm_info's were useful for developing this patch series, but do we still need them? -- 2.1.3 ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 26/42] vhost/net: force len for TX to host endian
On Wed, Nov 26, 2014 at 03:31:02PM +0100, Cornelia Huck wrote: On Tue, 25 Nov 2014 18:43:14 +0200 Michael S. Tsirkin m...@redhat.com wrote: We use native endian-ness internally but never expose it to guest. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/net.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 8dae2f7..dce5c58 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -48,15 +48,15 @@ MODULE_PARM_DESC(experimental_zcopytx, Enable Zero Copy TX; * status internally; used for zerocopy tx only. */ /* Lower device DMA failed */ -#define VHOST_DMA_FAILED_LEN 3 +#define VHOST_DMA_FAILED_LEN ((__force __virtio32)3) /* Lower device DMA done */ -#define VHOST_DMA_DONE_LEN 2 +#define VHOST_DMA_DONE_LEN ((__force __virtio32)2) /* Lower device DMA in progress */ -#define VHOST_DMA_IN_PROGRESS 1 +#define VHOST_DMA_IN_PROGRESS ((__force __virtio32)1) /* Buffer unused */ -#define VHOST_DMA_CLEAR_LEN0 +#define VHOST_DMA_CLEAR_LEN((__force __virtio32)0) I find these constants a bit confusing: What does __virtio32 mean without the context of a vq or device? -#define VHOST_DMA_IS_DONE(len) ((len) = VHOST_DMA_DONE_LEN) +#define VHOST_DMA_IS_DONE(len) ((__force u32)(len) = (__force u32)VHOST_DMA_DONE_LEN) And here you cast it to a plain u32 again. I looked at the final code, and you seem either to use the above constants for .len or do a cpu_to_vhost32(). Wouldn't you need to convert the constants as well? I tried to explain it in the commit message. It's a hack in vhost: it keeps virtio used structure in host memory, but abuses length field for internal housekeeping. This works because length in used ring for tx is always 0. enum { VHOST_NET_FEATURES = VHOST_FEATURES | -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CFT PATCH v2 2/2] KVM: x86: support XSAVES usage in the host
2014-11-26 14:57+0100, Paolo Bonzini: On 26/11/2014 14:53, Radim Krčmář wrote: get_xsave = native_xrstor(guest_xsave); xsave(aligned_userspace_buffer) set_xsave = xrstor(aligned_userspace_buffer); native_xsave(guest_xsave) Could that work? It could, though it is more like get_fpu() native_xrstor(guest_xsave) xsave(buffer) put_fpu() and vice versa. Also, the userspace buffer is mos likely not aligned, so you need some kind of bounce buffer. It can be done if the CPUID turns out to be a bottleneck, apart from that it'd most likely be slower. Yeah, it was mostly making this code more future-proof ... it is easier to convince xsave.h to export its structures if CPUID is the problem. (I still see some hope for Linux, so performance isn't my primary goal.) I'm quite interested in CPUID now though, so I'll try to benchmark it, someday. (Sorry, I don't fully understand your thoughts and I just talk more of the same in those scenarios.) I'm not sure what is more future proof. :) I wonder if native_xrstor could be a problem the day XRSTORS actually sets/restores MSRs as the processor documentation promises. Isn't that a problem only for emulation? We do not need that to pass them to userspace via KVM_GET/SET_XSAVE because we have KVM_GET/SET_MSR for that, but it may cause problems if get_xsave uses XRSTORS and thus sets the MSRs to unanticipated values. KVM_GET/SET_XSAVE is defined to use the format of XSAVE/XRSTOR. (Userspace shouldn't know how we actually store guest's state; KVM_GET/SET_MSR doesn't read host's state.) XRSTORS won't affect the guest in any way, we are just going to use it to convert the xsave, so any side-effects are going to stay in the host. (This could break the host though.) Difficult to say without more information on Intel's plans. My main presumption is that XSAVE*-XRSTOR*-XSAVE-XRSTOR has the same result as XSAVE-XRSTOR, because we are only interested in the state, not in any metadata. (If it isn't possible to combine intructions, like XSAVE after XRSTORS, this solution won't work.) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another Obsolete Fix me in trace.h?
2014-11-26 09:15-0500, Steven Rostedt: On Wed, 26 Nov 2014 13:40:26 +0100 Radim Krčmář rkrc...@redhat.com wrote: The kernel part is trivial. Most of the code is going to be in tools/lib/traceevent/event-parse.c. I wasn't sure whether to 1) define new 'enum print_arg_type' for our function and do exactly what other __print* did 2) extend existing PRINT_FUNC with variable arguments and register it as a global event handler as (2) makes more sense to me, but we already had it when some __print* was implemented ... (and I didn't want to dig too deep into the project) Add a new hardcoded instance. The PRINT_FUNC is for plugins that have a unique function for the events they handle. Notice that event-parse does not define any func handlers. If this is to be generic, then lets make it specified as a new enum. Ok, thanks. (It just seemed weird that we have an infrastructure that could allow a generic plugin to cover this functionality.) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 26/42] vhost/net: force len for TX to host endian
On Wed, 26 Nov 2014 16:44:00 +0200 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Nov 26, 2014 at 03:31:02PM +0100, Cornelia Huck wrote: On Tue, 25 Nov 2014 18:43:14 +0200 Michael S. Tsirkin m...@redhat.com wrote: We use native endian-ness internally but never expose it to guest. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/net.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 8dae2f7..dce5c58 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -48,15 +48,15 @@ MODULE_PARM_DESC(experimental_zcopytx, Enable Zero Copy TX; * status internally; used for zerocopy tx only. */ /* Lower device DMA failed */ -#define VHOST_DMA_FAILED_LEN 3 +#define VHOST_DMA_FAILED_LEN ((__force __virtio32)3) /* Lower device DMA done */ -#define VHOST_DMA_DONE_LEN 2 +#define VHOST_DMA_DONE_LEN ((__force __virtio32)2) /* Lower device DMA in progress */ -#define VHOST_DMA_IN_PROGRESS1 +#define VHOST_DMA_IN_PROGRESS((__force __virtio32)1) /* Buffer unused */ -#define VHOST_DMA_CLEAR_LEN 0 +#define VHOST_DMA_CLEAR_LEN ((__force __virtio32)0) I find these constants a bit confusing: What does __virtio32 mean without the context of a vq or device? -#define VHOST_DMA_IS_DONE(len) ((len) = VHOST_DMA_DONE_LEN) +#define VHOST_DMA_IS_DONE(len) ((__force u32)(len) = (__force u32)VHOST_DMA_DONE_LEN) And here you cast it to a plain u32 again. I looked at the final code, and you seem either to use the above constants for .len or do a cpu_to_vhost32(). Wouldn't you need to convert the constants as well? I tried to explain it in the commit message. It's a hack in vhost: it keeps virtio used structure in host memory, but abuses length field for internal housekeeping. This works because length in used ring for tx is always 0. Ah, ok. It might make sense to add this explanation to the patch :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] KVM: arm: guest debug, define API headers
Andrew Jones drjo...@redhat.com writes: On Tue, Nov 25, 2014 at 04:10:00PM +, Alex Bennée wrote: This commit defines the API headers for guest debugging. There are two architecture specific debug structures: snip +/* Architecture related debug defines - upper 16 bits of + * kvm_guest_debug-control + */ +#define KVM_GUESTDBG_USE_SW_BP_SHIFT16 +#define KVM_GUESTDBG_USE_SW_BP (1 KVM_GUESTDBG_USE_SW_BP_SHIFT) +#define KVM_GUESTDBG_USE_HW_BP_SHIFT17 +#define KVM_GUESTDBG_USE_HW_BP (1 KVM_GUESTDBG_USE_HW_BP_SHIFT) + I see this are defined in arch/x86/include/uapi/asm/kvm.h, so you needed to reproduce them here, but shouldn't they be promoted to include/uapi/linux/kvm.h instead? Well if we move them to common uapi we either restrict the $ARCH specific options that don't have SW/HW BKPTS (would be weird but...) or make them generic in the lower 16 bits (breaks API). But in principle I have no objection if other don't. -- Alex Bennée -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 26/42] vhost/net: force len for TX to host endian
On Wed, Nov 26, 2014 at 03:54:40PM +0100, Cornelia Huck wrote: On Wed, 26 Nov 2014 16:44:00 +0200 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Nov 26, 2014 at 03:31:02PM +0100, Cornelia Huck wrote: On Tue, 25 Nov 2014 18:43:14 +0200 Michael S. Tsirkin m...@redhat.com wrote: We use native endian-ness internally but never expose it to guest. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/net.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 8dae2f7..dce5c58 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -48,15 +48,15 @@ MODULE_PARM_DESC(experimental_zcopytx, Enable Zero Copy TX; * status internally; used for zerocopy tx only. */ /* Lower device DMA failed */ -#define VHOST_DMA_FAILED_LEN 3 +#define VHOST_DMA_FAILED_LEN ((__force __virtio32)3) /* Lower device DMA done */ -#define VHOST_DMA_DONE_LEN 2 +#define VHOST_DMA_DONE_LEN ((__force __virtio32)2) /* Lower device DMA in progress */ -#define VHOST_DMA_IN_PROGRESS 1 +#define VHOST_DMA_IN_PROGRESS ((__force __virtio32)1) /* Buffer unused */ -#define VHOST_DMA_CLEAR_LEN0 +#define VHOST_DMA_CLEAR_LEN((__force __virtio32)0) I find these constants a bit confusing: What does __virtio32 mean without the context of a vq or device? -#define VHOST_DMA_IS_DONE(len) ((len) = VHOST_DMA_DONE_LEN) +#define VHOST_DMA_IS_DONE(len) ((__force u32)(len) = (__force u32)VHOST_DMA_DONE_LEN) And here you cast it to a plain u32 again. I looked at the final code, and you seem either to use the above constants for .len or do a cpu_to_vhost32(). Wouldn't you need to convert the constants as well? I tried to explain it in the commit message. It's a hack in vhost: it keeps virtio used structure in host memory, but abuses length field for internal housekeeping. This works because length in used ring for tx is always 0. Ah, ok. It might make sense to add this explanation to the patch :) Absolutely. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/7] KVM: arm: guest debug, add stub KVM_SET_GUEST_DEBUG ioctl
Andrew Jones drjo...@redhat.com writes: On Tue, Nov 25, 2014 at 04:10:01PM +, Alex Bennée wrote: This commit adds a stub function to support the KVM_SET_GUEST_DEBUG ioctl. Currently any operation flag will return EINVAL. Actual functionality will be added with further patches. Technically the stub is already there, and you're extending it to start looking at control flags, but still not doing anything yet. Sure we do: int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) { -return -EINVAL; +/* If it's not enabled clear all flags */ +if (!(dbg-control KVM_GUESTDBG_ENABLE)) { +vcpu-guest_debug = 0; +return 0; +} That's some class non-functionality right there ;-) +vcpu-guest_debug = dbg-control; +kvm_info(%s: guest_debug is 0x%lx\n, __func__, vcpu-guest_debug); + +/* Single Step */ +if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) { +kvm_info(SS requested, not yet implemented\n); +return -EINVAL; +} + +/* Software Break Points */ +if (vcpu-guest_debug KVM_GUESTDBG_USE_SW_BP) { +kvm_info(SW BP support requested, not yet implemented\n); +return -EINVAL; +} + +/* Hardware assisted Break and Watch points */ +if (vcpu-guest_debug KVM_GUESTDBG_USE_HW_BP) { +kvm_info(HW BP support requested, not yet implemented\n); +return -EINVAL; +} + +return 0; } I guess all the kvm_info's were useful for developing this patch series, but do we still need them? They also served the very useful roll of stopping checkpatch.pl bitching about my reluctance to remove braces from the if () { } clauses. However I take your point. I can certainly remove the kvm_info() statements as each bit of functionality is added while leaving this one to help when someone is bisecting and confused. -- Alex Bennée -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] KVM: arm: guest debug, define API headers
Peter Maydell peter.mayd...@linaro.org writes: On 25 November 2014 at 16:10, Alex Bennée alex.ben...@linaro.org wrote: +/* Exit types which define why we did a debug exit */ +#define KVM_DEBUG_EXIT_ERROR 0x0 +#define KVM_DEBUG_EXIT_SINGLE_STEP 0x1 +#define KVM_DEBUG_EXIT_SW_BKPT 0x2 +#define KVM_DEBUG_EXIT_HW_BKPT 0x3 +#define KVM_DEBUG_EXIT_HW_WTPT 0x4 The names of these imply that they're generic, but they're defined in an arch-specific header file... Yeah, I think these will die and I'll just export the syndrome information directly to QEMU. -- Alex Bennée -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: Fix reserved x2apic registers
x2APIC has no registers for DFR and ICR (see Intel SDM 10.12.1.2 x2APIC Register Address Space). KVM needs to cause #GP on such accesses. Fix it. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/lapic.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index e0e5642..7611b75 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1897,6 +1897,12 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data) if (!irqchip_in_kernel(vcpu-kvm) || !apic_x2apic_mode(apic)) return 1; + if (reg == APIC_DFR || reg == APIC_ICR2) { + apic_debug(KVM_APIC_READ: read x2apic reserved register %x\n, + reg); + return 1; + } + if (apic_reg_read(apic, reg, 4, low)) return 1; if (msr == 0x830) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another Obsolete Fix me in trace.h?
On Wed, 26 Nov 2014 15:49:34 +0100 Radim Krčmář rkrc...@redhat.com wrote: (It just seemed weird that we have an infrastructure that could allow a generic plugin to cover this functionality.) Maybe in the future we may do something like that. But for now, think of these as optimized ;-) -- Steve -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Fix reserved x2apic registers
The commit message has a typo - it should be “DFR and ICR2”. I didn’t post a new version in order not to confuse. Nadav Nadav Amit na...@cs.technion.ac.il wrote: x2APIC has no registers for DFR and ICR (see Intel SDM 10.12.1.2 x2APIC Register Address Space). KVM needs to cause #GP on such accesses. Fix it. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/lapic.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index e0e5642..7611b75 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1897,6 +1897,12 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data) if (!irqchip_in_kernel(vcpu-kvm) || !apic_x2apic_mode(apic)) return 1; + if (reg == APIC_DFR || reg == APIC_ICR2) { + apic_debug(KVM_APIC_READ: read x2apic reserved register %x\n, +reg); + return 1; + } + if (apic_reg_read(apic, reg, 4, low)) return 1; if (msr == 0x830) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Fix reserved x2apic registers
2014-11-26 17:11+0200, Nadav Amit: x2APIC has no registers for DFR and ICR (see Intel SDM 10.12.1.2 x2APIC Register Address Space). KVM needs to cause #GP on such accesses. We should to take care of write to ICR2 as well. Fix it. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/lapic.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index e0e5642..7611b75 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1897,6 +1897,12 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data) if (!irqchip_in_kernel(vcpu-kvm) || !apic_x2apic_mode(apic)) return 1; + if (reg == APIC_DFR || reg == APIC_ICR2) { + apic_debug(KVM_APIC_READ: read x2apic reserved register %x\n, +reg); + return 1; + } + if (apic_reg_read(apic, reg, 4, low)) return 1; if (msr == 0x830) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: x86: Fix reserved x2apic registers
x2APIC has no registers for DFR and ICR2 (see Intel SDM 10.12.1.2 x2APIC Register Address Space). KVM needs to cause #GP on such accesses. Fix it (DFR and ICR2 on read, ICR2 on write, DFR already handled on writes). Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- v1-v2: Cause #GP on ICR2 write access (as Radim pointed) --- arch/x86/kvm/lapic.c | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index e0e5642..9c90d31 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1883,6 +1883,9 @@ int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data) if (!irqchip_in_kernel(vcpu-kvm) || !apic_x2apic_mode(apic)) return 1; + if (reg == APIC_ICR2) + return 1; + /* if this is ICR write vector before command */ if (msr == 0x830) apic_reg_write(apic, APIC_ICR2, (u32)(data 32)); @@ -1897,6 +1900,12 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data) if (!irqchip_in_kernel(vcpu-kvm) || !apic_x2apic_mode(apic)) return 1; + if (reg == APIC_DFR || reg == APIC_ICR2) { + apic_debug(KVM_APIC_READ: read x2apic reserved register %x\n, + reg); + return 1; + } + if (apic_reg_read(apic, reg, 4, low)) return 1; if (msr == 0x830) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: x86: Fix reserved x2apic registers
2014-11-26 17:56+0200, Nadav Amit: x2APIC has no registers for DFR and ICR2 (see Intel SDM 10.12.1.2 x2APIC Register Address Space). KVM needs to cause #GP on such accesses. Fix it (DFR and ICR2 on read, ICR2 on write, DFR already handled on writes). Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- Reviewed-by: Radim Krčmář rkrc...@redhat.com v1-v2: Cause #GP on ICR2 write access (as Radim pointed) --- arch/x86/kvm/lapic.c | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index e0e5642..9c90d31 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1883,6 +1883,9 @@ int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data) if (!irqchip_in_kernel(vcpu-kvm) || !apic_x2apic_mode(apic)) return 1; + if (reg == APIC_ICR2) + return 1; + /* if this is ICR write vector before command */ if (msr == 0x830) apic_reg_write(apic, APIC_ICR2, (u32)(data 32)); @@ -1897,6 +1900,12 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data) if (!irqchip_in_kernel(vcpu-kvm) || !apic_x2apic_mode(apic)) return 1; + if (reg == APIC_DFR || reg == APIC_ICR2) { + apic_debug(KVM_APIC_READ: read x2apic reserved register %x\n, +reg); + return 1; + } + if (apic_reg_read(apic, reg, 4, low)) return 1; if (msr == 0x830) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: use MSR_ICR instead of a number
0x830 MSR is 0x300 xAPIC MMIO, which is MSR_ICR. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- This applies on top of Amit's [PATCH v2] KVM: x86: Fix reserved x2apic registers in which I noticed this minor deficit. arch/x86/kvm/lapic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 9c90d31..687874f 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1887,7 +1887,7 @@ int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data) return 1; /* if this is ICR write vector before command */ - if (msr == 0x830) + if (reg == APIC_ICR) apic_reg_write(apic, APIC_ICR2, (u32)(data 32)); return apic_reg_write(apic, reg, (u32)data); } @@ -1908,7 +1908,7 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data) if (apic_reg_read(apic, reg, 4, low)) return 1; - if (msr == 0x830) + if (reg == APIC_ICR) apic_reg_read(apic, APIC_ICR2, 4, high); *data = (((u64)high) 32) | low; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7] KVM: arm64: guest debug, add SW break point support
On Tue, Nov 25, 2014 at 04:10:02PM +, Alex Bennée wrote: This adds support for SW breakpoints inserted by userspace. First we need to trap all BKPT exceptions in the hypervisor (ELS). This in controlled through the MDCR_EL2 register. I've added a new field to the vcpu structure to hold this value. There should be scope to rationlise this with the VCPU_DEBUG_FLAGS/KVM_ARM64_DEBUG_DIRTY_SHIFT manipulation in later patches. I think we should start using the new mdcr_el2 field everywhere we plan to within the same series that it is introduced. Otherwise it's hard to tell if we need an mdcr_el2 field, or if a more generic field would suffice. We can always translate bits of a more generic field to mdcr_el2 bits when necessary, but not the reverse. Once the exception arrives we triggers an exit from the main hypervisor s/triggers/trigger/ loop to the hypervisor controller. The kvm_debug_exit_arch carries the address of the exception. If the controller doesn't know of breakpoint ^ a then we have a guest inserted breakpoint and the hypervisor needs to start again and deliver the exception to guest. Signed-off-by: Alex Bennée alex.ben...@linaro.org diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 2c6386e..9383359 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2592,7 +2592,7 @@ when running. Common control bits are: The top 16 bits of the control field are architecture specific control flags which can include the following: - - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86] + - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64] - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390] - KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86] - KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86] diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index a0ff410..48d26bb 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -303,6 +303,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) { + /* Route debug traps to el2? */ + bool route_el2 = false; + /* If it's not enabled clear all flags */ if (!(dbg-control KVM_GUESTDBG_ENABLE)) { vcpu-guest_debug = 0; @@ -320,8 +323,8 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, /* Software Break Points */ if (vcpu-guest_debug KVM_GUESTDBG_USE_SW_BP) { - kvm_info(SW BP support requested, not yet implemented\n); - return -EINVAL; + kvm_info(SW BP support requested\n); + route_el2 = true; } /* Hardware assisted Break and Watch points */ @@ -330,6 +333,20 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, return -EINVAL; } + /* If we are going to handle any debug exceptions we need to + * set MDCE_EL2.TDE to ensure they are routed to the + * hypervisor. If userspace determines the exception was + * actually internal to the guest it needs to handle + * re-injecting the exception. + */ kernel comment blocks typically start with an empty line, e.g. /* * comment block */ + if (route_el2) { + kvm_info(routing debug exceptions); + vcpu-arch.mdcr_el2 = MDCR_EL2_TDE; + } else { + kvm_info(not routing debug exceptions); + vcpu-arch.mdcr_el2 = 0; + } This looks weird because we're only managing some of the mdcr_el2 bits with the mdcr_el2 field. If we were managing all of them then these would need to be |= MDCR_EL2_TDE and = ~SOME_MASK instead. If we never plan to manage all the bits, then I think that points more towards the need for a more generic field instead. + return 0; } diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 2012c4b..38b0f07 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -98,6 +98,7 @@ struct kvm_vcpu_arch { /* HYP configuration */ u64 hcr_el2; + u32 mdcr_el2; /* Exception Information */ struct kvm_vcpu_fault_info fault; diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index 9a9fce0..8da1043 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -122,6 +122,7 @@ int main(void) DEFINE(VCPU_HPFAR_EL2, offsetof(struct kvm_vcpu, arch.fault.hpfar_el2)); DEFINE(VCPU_DEBUG_FLAGS, offsetof(struct kvm_vcpu, arch.debug_flags)); DEFINE(VCPU_HCR_EL2, offsetof(struct kvm_vcpu, arch.hcr_el2)); + DEFINE(VCPU_MDCR_EL2, offsetof(struct kvm_vcpu,
Re: [CFT PATCH v2 2/2] KVM: x86: support XSAVES usage in the host
On 26/11/2014 15:42, Radim Krčmář wrote: 2014-11-26 14:57+0100, Paolo Bonzini: On 26/11/2014 14:53, Radim Krčmář wrote: get_xsave = native_xrstor(guest_xsave); xsave(aligned_userspace_buffer) set_xsave = xrstor(aligned_userspace_buffer); native_xsave(guest_xsave) Could that work? It could, though it is more like get_fpu() native_xrstor(guest_xsave) xsave(buffer) put_fpu() and vice versa. Also, the userspace buffer is mos likely not aligned, so you need some kind of bounce buffer. It can be done if the CPUID turns out to be a bottleneck, apart from that it'd most likely be slower. Yeah, it was mostly making this code more future-proof ... it is easier to convince xsave.h to export its structures if CPUID is the problem. (I still see some hope for Linux, so performance isn't my primary goal.) I'm quite interested in CPUID now though, so I'll try to benchmark it, someday. (Sorry, I don't fully understand your thoughts and I just talk more of the same in those scenarios.) I'm not sure what is more future proof. :) I wonder if native_xrstor could be a problem the day XRSTORS actually sets/restores MSRs as the processor documentation promises. XRSTORS won't affect the guest in any way, we are just going to use it to convert the xsave, so any side-effects are going to stay in the host. (This could break the host though.) Yes, that's the problem. :) My main presumption is that XSAVE*-XRSTOR*-XSAVE-XRSTOR has the same result as XSAVE-XRSTOR, because we are only interested in the state, not in any metadata. (If it isn't possible to combine intructions, like XSAVE after XRSTORS, this solution won't work.) Yes, that should be right. But actually what KVM would do it is XRSTOR-XSAVE*-XRSTOR*-XSAVE. The problem here is the side effects of doing XRSTORS far from a guest entry... though that would likely be handled by load_guest_fpu/put_guest_fpu. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] KVM: arm64: guest debug, add support for single-step
On Tue, Nov 25, 2014 at 04:10:03PM +, Alex Bennée wrote: This adds support for single-stepping the guest. As userspace can and will manipulate guest registers before restarting any tweaking of the registers has to occur just before control is passed back to the guest. Furthermore while guest debugging is in effect we need to squash the ability of the guest to single-step itself as we have no easy way of re-entering the guest after the exception has been delivered to the hypervisor. Signed-off-by: Alex Bennée alex.ben...@linaro.org diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 48d26bb..a76daae 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -38,6 +38,7 @@ #include asm/tlbflush.h #include asm/cacheflush.h #include asm/virt.h +#include asm/debug-monitors.h #include asm/kvm_arm.h #include asm/kvm_asm.h #include asm/kvm_mmu.h @@ -300,6 +301,17 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) kvm_arm_set_running_vcpu(NULL); } +/** + * kvm_arch_vcpu_ioctl_set_guest_debug - Setup guest debugging + * @kvm: pointer to the KVM struct + * @kvm_guest_debug: the ioctl data buffer + * + * This sets up the VM for guest debugging. Care has to be taken when + * manipulating guest registers as these will be set/cleared by the + * hyper-visor controller, typically before each kvm_run event. As a + * result modification of the guest registers needs to take place + * after they have been restored in the hyp.S trampoline code. + */ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) { @@ -317,8 +329,8 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, /* Single Step */ if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) { - kvm_info(SS requested, not yet implemented\n); - return -EINVAL; + kvm_info(SS requested\n); + route_el2 = true; } /* Software Break Points */ diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index 8da1043..78e5ae1 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -121,6 +121,7 @@ int main(void) DEFINE(VCPU_FAR_EL2, offsetof(struct kvm_vcpu, arch.fault.far_el2)); DEFINE(VCPU_HPFAR_EL2, offsetof(struct kvm_vcpu, arch.fault.hpfar_el2)); DEFINE(VCPU_DEBUG_FLAGS, offsetof(struct kvm_vcpu, arch.debug_flags)); + DEFINE(GUEST_DEBUG,offsetof(struct kvm_vcpu, guest_debug)); DEFINE(VCPU_HCR_EL2, offsetof(struct kvm_vcpu, arch.hcr_el2)); DEFINE(VCPU_MDCR_EL2, offsetof(struct kvm_vcpu, arch.mdcr_el2)); DEFINE(VCPU_IRQ_LINES, offsetof(struct kvm_vcpu, arch.irq_lines)); diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c index 28dc92b..6def054 100644 --- a/arch/arm64/kvm/handle_exit.c +++ b/arch/arm64/kvm/handle_exit.c @@ -91,6 +91,25 @@ static int kvm_handle_bkpt(struct kvm_vcpu *vcpu, struct kvm_run *run) return 0; } +/** + * kvm_handle_ss - handle single step exceptions + * + * @vcpu:the vcpu pointer same @run comment as other handler header in previous patch + * + * See: ARM ARM D2.12 for the details. While the host is routing debug + * exceptions to it's handlers we have to suppress the ability of the + * guest to trigger exceptions. + */ +static int kvm_handle_ss(struct kvm_vcpu *vcpu, struct kvm_run *run) +{ + WARN_ON(!(vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP)); I'm not sure about this WARN_ON. Is there some scenario you were thinking of when you put it here? Is there some scenario where this could trigger so frequently we kill the log buffer? + + run-exit_reason = KVM_EXIT_DEBUG; + run-debug.arch.exit_type = KVM_DEBUG_EXIT_SINGLE_STEP; + run-debug.arch.address = *vcpu_pc(vcpu); + return 0; +} + static exit_handle_fn arm_exit_handlers[] = { [ESR_EL2_EC_WFI]= kvm_handle_wfx, [ESR_EL2_EC_CP15_32]= kvm_handle_cp15_32, @@ -105,6 +124,7 @@ static exit_handle_fn arm_exit_handlers[] = { [ESR_EL2_EC_SYS64] = kvm_handle_sys_reg, [ESR_EL2_EC_IABT] = kvm_handle_guest_abort, [ESR_EL2_EC_DABT] = kvm_handle_guest_abort, + [ESR_EL2_EC_SOFTSTP]= kvm_handle_ss, [ESR_EL2_EC_BKPT32] = kvm_handle_bkpt, [ESR_EL2_EC_BRK64] = kvm_handle_bkpt, }; diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S index 3c733ea..c0bc218 100644 --- a/arch/arm64/kvm/hyp.S +++ b/arch/arm64/kvm/hyp.S @@ -16,6 +16,7 @@ */ #include linux/linkage.h +#include linux/kvm.h #include asm/assembler.h #include asm/memory.h @@ -168,6 +169,31 @@ // x19-x29, lr, sp*, elr*, spsr* restore_common_regs + // After restoring the guest registers but before we return to the guest + // we may want to make some final
Re: [PATCH 2/7] KVM: arm: guest debug, define API headers
On 26/11/2014 15:58, Alex Bennée wrote: Andrew Jones drjo...@redhat.com writes: On Tue, Nov 25, 2014 at 04:10:00PM +, Alex Bennée wrote: This commit defines the API headers for guest debugging. There are two architecture specific debug structures: snip +/* Architecture related debug defines - upper 16 bits of + * kvm_guest_debug-control + */ +#define KVM_GUESTDBG_USE_SW_BP_SHIFT 16 +#define KVM_GUESTDBG_USE_SW_BP (1 KVM_GUESTDBG_USE_SW_BP_SHIFT) +#define KVM_GUESTDBG_USE_HW_BP_SHIFT 17 +#define KVM_GUESTDBG_USE_HW_BP (1 KVM_GUESTDBG_USE_HW_BP_SHIFT) + I see this are defined in arch/x86/include/uapi/asm/kvm.h, so you needed to reproduce them here, but shouldn't they be promoted to include/uapi/linux/kvm.h instead? Well if we move them to common uapi we either restrict the $ARCH specific options that don't have SW/HW BKPTS (would be weird but...) or make them generic in the lower 16 bits (breaks API). But in principle I have no objection if other don't. I think it's a matter of personal taste. Architecture-specific means not all architectures may support it, but it's certainly a good idea to reuse the same value if multiple architectures do support a #define. What you did is fine, another possibility is to do #define __KVM_GUESTDBG_USE_SW_BP (1 16) in include/uapi/linux/kvm.h, and #define KVM_GUESTDBG_USE_SW_BP __KVM_GUESTDBG_USE_SW_BP in the arch-specific file. Andrew, is this closer to what you intended? Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] KVM: arm64: re-factor hyp.S debug register code
On Tue, Nov 25, 2014 at 04:10:04PM +, Alex Bennée wrote: This is a pre-cursor to sharing the code with the guest debug support. This replaces the big macro that fishes data out of a fixed location with a more general helper macro to restore a set of debug registers. It uses macro substitution so it can be re-used for debug control and value registers. It does however rely on the debug registers being 64 bit aligned (as they happen to be in the hyp ABI). Signed-off-by: Alex Bennée alex.ben...@linaro.org diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S index c0bc218..b38ce3d 100644 --- a/arch/arm64/kvm/hyp.S +++ b/arch/arm64/kvm/hyp.S @@ -490,65 +490,12 @@ msr mdscr_el1, x25 .endm -.macro restore_debug - // x2: base address for cpu context - // x3: tmp register - - mrs x26, id_aa64dfr0_el1 - ubfxx24, x26, #12, #4 // Extract BRPs - ubfxx25, x26, #20, #4 // Extract WRPs - mov w26, #15 - sub w24, w26, w24 // How many BPs to skip - sub w25, w26, w25 // How many WPs to skip - - add x3, x2, #CPU_SYSREG_OFFSET(DBGBCR0_EL1) - - adr x26, 1f - add x26, x26, x24, lsl #2 - br x26 -1: - ldr x20, [x3, #(15 * 8)] - ldr x19, [x3, #(14 * 8)] - ldr x18, [x3, #(13 * 8)] - ldr x17, [x3, #(12 * 8)] - ldr x16, [x3, #(11 * 8)] - ldr x15, [x3, #(10 * 8)] - ldr x14, [x3, #(9 * 8)] - ldr x13, [x3, #(8 * 8)] - ldr x12, [x3, #(7 * 8)] - ldr x11, [x3, #(6 * 8)] - ldr x10, [x3, #(5 * 8)] - ldr x9, [x3, #(4 * 8)] - ldr x8, [x3, #(3 * 8)] - ldr x7, [x3, #(2 * 8)] - ldr x6, [x3, #(1 * 8)] - ldr x5, [x3, #(0 * 8)] - +.macro setup_debug_registers type +// x3: pointer to register set +// x4: number of registers to copy +// x5..x20/x26 trashed ^ comma instead? adr x26, 1f - add x26, x26, x24, lsl #2 - br x26 -1: - msr dbgbcr15_el1, x20 - msr dbgbcr14_el1, x19 - msr dbgbcr13_el1, x18 - msr dbgbcr12_el1, x17 - msr dbgbcr11_el1, x16 - msr dbgbcr10_el1, x15 - msr dbgbcr9_el1, x14 - msr dbgbcr8_el1, x13 - msr dbgbcr7_el1, x12 - msr dbgbcr6_el1, x11 - msr dbgbcr5_el1, x10 - msr dbgbcr4_el1, x9 - msr dbgbcr3_el1, x8 - msr dbgbcr2_el1, x7 - msr dbgbcr1_el1, x6 - msr dbgbcr0_el1, x5 - - add x3, x2, #CPU_SYSREG_OFFSET(DBGBVR0_EL1) - - adr x26, 1f - add x26, x26, x24, lsl #2 + add x26, x26, x4, lsl #2 br x26 1: ldr x20, [x3, #(15 * 8)] @@ -569,116 +516,25 @@ ldr x5, [x3, #(0 * 8)] adr x26, 1f - add x26, x26, x24, lsl #2 - br x26 -1: - msr dbgbvr15_el1, x20 - msr dbgbvr14_el1, x19 - msr dbgbvr13_el1, x18 - msr dbgbvr12_el1, x17 - msr dbgbvr11_el1, x16 - msr dbgbvr10_el1, x15 - msr dbgbvr9_el1, x14 - msr dbgbvr8_el1, x13 - msr dbgbvr7_el1, x12 - msr dbgbvr6_el1, x11 - msr dbgbvr5_el1, x10 - msr dbgbvr4_el1, x9 - msr dbgbvr3_el1, x8 - msr dbgbvr2_el1, x7 - msr dbgbvr1_el1, x6 - msr dbgbvr0_el1, x5 - - add x3, x2, #CPU_SYSREG_OFFSET(DBGWCR0_EL1) - - adr x26, 1f - add x26, x26, x25, lsl #2 + add x26, x26, x4, lsl #2 br x26 1: - ldr x20, [x3, #(15 * 8)] - ldr x19, [x3, #(14 * 8)] - ldr x18, [x3, #(13 * 8)] - ldr x17, [x3, #(12 * 8)] - ldr x16, [x3, #(11 * 8)] - ldr x15, [x3, #(10 * 8)] - ldr x14, [x3, #(9 * 8)] - ldr x13, [x3, #(8 * 8)] - ldr x12, [x3, #(7 * 8)] - ldr x11, [x3, #(6 * 8)] - ldr x10, [x3, #(5 * 8)] - ldr x9, [x3, #(4 * 8)] - ldr x8, [x3, #(3 * 8)] - ldr x7, [x3, #(2 * 8)] - ldr x6, [x3, #(1 * 8)] - ldr x5, [x3, #(0 * 8)] - - adr x26, 1f - add x26, x26, x25, lsl #2 - br x26 -1: - msr dbgwcr15_el1, x20 - msr dbgwcr14_el1, x19 - msr dbgwcr13_el1, x18 - msr dbgwcr12_el1, x17 - msr dbgwcr11_el1, x16 - msr dbgwcr10_el1, x15 - msr dbgwcr9_el1, x14 - msr dbgwcr8_el1, x13 - msr dbgwcr7_el1, x12 - msr dbgwcr6_el1, x11 - msr dbgwcr5_el1, x10 - msr dbgwcr4_el1, x9 - msr dbgwcr3_el1, x8 - msr dbgwcr2_el1, x7 - msr dbgwcr1_el1, x6 - msr dbgwcr0_el1, x5 - - add x3, x2, #CPU_SYSREG_OFFSET(DBGWVR0_EL1) - - adr x26, 1f - add x26, x26, x25, lsl #2 - br x26 -1: -
Re: [RFC PATCH 4/5] ARM: enable linking against eventfd and irqchip
On 11/24/2014 10:27 PM, Nikolay Nikolaev wrote: This enables compilation of the eventfd feature on ARM. Hi Nikolay, why irqchip in the title? Best Regards Eric Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com --- arch/arm/kvm/Kconfig |1 + arch/arm/kvm/Makefile |2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 466bd29..a4b0312 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -20,6 +20,7 @@ config KVM bool Kernel-based Virtual Machine (KVM) support select PREEMPT_NOTIFIERS select ANON_INODES + select HAVE_KVM_EVENTFD select HAVE_KVM_CPU_RELAX_INTERCEPT select KVM_MMIO select KVM_ARM_HOST diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile index 443b8be..539c1a5 100644 --- a/arch/arm/kvm/Makefile +++ b/arch/arm/kvm/Makefile @@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt) AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt) KVM := ../../../virt/kvm -kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o +kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o obj-y += kvm-arm.o init.o interrupts.o obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs
Paolo Bonzini pbonz...@redhat.com wrote: On 06/11/2014 17:45, Radim Krčmář wrote: 2014-11-06 10:34+0100, Paolo Bonzini: On 05/11/2014 21:45, Nadav Amit wrote: If I understand the SDM correctly, in such scenario (all APICs are software disabled) the mode is left as the default - flat mode (see APIC doesn't have any global mode (it is just KVM's simplification), so when a message lands on the system bus, it just compares MDA with LDR and DFR ... section 10.6.2.2 Logical Destination Mode”): All processors that have their APIC software enabled (using the spurious vector enable/disable bit) must have their DFRs (Destination Format Registers) programmed identically. The default mode for DFR is flat mode.” I think the default mode points to reset state, which is flat DFR; and it is mentioned only because of the following sentence If you are using cluster mode, DFRs must be programmed before the APIC is software enabled. That's not what either Bochs or QEMU do, though. (Though in the case of Bochs I cannot find the place where reception of IPIs is prevented for software-disabled APICs, so I'm not sure how much to trust it in this case). I'm not sure why software-disabled APICs could have different DFRs, if the APICs can receive NMI IPIs. I'll ask Intel. When changing the mode, we can't switch DFR synchronously, so it has to happen and NMI may be needed (watchdog?) before APIC configuration. Explicit statement might have been a hint to be _extra_ careful when using logical destination for INIT, NMI, ... I wonder what they'll say. Anyway, Paolo's patch seems to be in the right direction, I'd modify it a bit though: LDR=0 isn't addressable in any logical mode, so we can ignore APICs that don't set it and decide the mode by the last nonzero one. This works in a situation, where one part is configured for cluster and the rest is still in reset state. (It gets harder if we allow nonzero LDRs with different DFR ... we'd need to split our logical map to handle it.) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 758f838..6da303e1 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -148,10 +148,6 @@ static void recalculate_apic_map(struct kvm *kvm) goto out; new-ldr_bits = 8; -/* flat mode is default */ -new-cid_shift = 8; -new-cid_mask = 0; -new-lid_mask = 0xff; new-broadcast = APIC_BROADCAST; kvm_for_each_vcpu(i, vcpu, kvm) { @@ -166,7 +162,7 @@ static void recalculate_apic_map(struct kvm *kvm) new-cid_mask = (1 KVM_X2APIC_CID_BITS) - 1; new-lid_mask = 0x; new-broadcast = X2APIC_BROADCAST; -} else if (kvm_apic_hw_enabled(apic)) { +} else if (kvm_apic_get_reg(apic, APIC_LDR)) { if (kvm_apic_get_reg(apic, APIC_DFR) == APIC_DFR_CLUSTER) { new-cid_shift = 4; I merged this patch and Nadav’s. Sorry for the late and long reply, but I got an issue with the new version (and my previous version as well). Indeed, the SDM states that DFR should be the same for enabled CPUs, and that the BIOS should get all CPUs in either xAPIC or x2APIC. Yet, there is nothing that says all CPUs need to be in xAPIC/x2APIC mode. In my tests (which pass on bare-metal), I got a scenario in which some CPUs are in xAPIC mode, the BSP changed (which is currently not handled correctly by KVM) and the BSP has x2APIC enabled. All the core APICs are software-enabled. The expected behaviour is that the CPUs with x2APIC enabled would work in x2APIC mode. I think such a transitory scenario is possible on real-systems as well, perhaps during CPU hot-plug. It appears the previous version (before all of our changes) handled it better. I presume the most efficient way is to start determining the APIC logical mode from the BSP, and if it is disabled, traverse the rest of the CPUs until finding the first one with APIC enabled. Yet, I have not finished doing and checking the BSP fix and other dependent INIT signal handling fixes. In the meanwhile, would you be ok with restoring some of the previous behaviour - i.e., x2APIC is enabled if any CPU turned it on (regardless to whether APIC is software enabled), otherwise use the configuration of the last enabled APIC? -- 8 — Subject: [PATCH] KVM: x86: Traverse all CPUs during recalculate_apic_map --- arch/x86/kvm/lapic.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 9c90d31..6dc2be6 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -139,6 +139,7 @@ static void recalculate_apic_map(struct kvm *kvm) struct kvm_apic_map *new, *old = NULL; struct kvm_vcpu *vcpu; int i; + bool any_enabled = false; new = kzalloc(sizeof(struct
Re: [RFC PATCH 1/5] KVM: redesing kvm_io_bus_ API to pass VCPU structure to the callbacks.
Hi Nikolay, typo in the title, On 11/24/2014 10:26 PM, Nikolay Nikolaev wrote: This is needed in e.g. ARM vGIC emulation, where the MMIO handling depends on the VCPU that does the access. Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com --- arch/ia64/kvm/kvm-ia64.c |4 ++-- arch/powerpc/kvm/powerpc.c |4 ++-- arch/s390/kvm/diag.c |2 +- arch/x86/kvm/vmx.c |2 +- arch/x86/kvm/x86.c | 11 ++- include/linux/kvm_host.h | 10 +- virt/kvm/coalesced_mmio.c |5 +++-- virt/kvm/eventfd.c |4 ++-- virt/kvm/iodev.h | 23 +++ virt/kvm/kvm_main.c| 32 10 files changed, 53 insertions(+), 44 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index ec6b9ac..16f1d26 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -246,10 +246,10 @@ static int handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) return 0; mmio: if (p-dir) - r = kvm_io_bus_read(vcpu-kvm, KVM_MMIO_BUS, p-addr, + r = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, p-addr, p-size, p-data); else - r = kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, p-addr, + r = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, p-addr, p-size, p-data); if (r) printk(KERN_ERRkvm: No iodevice found! addr:%lx\n, p-addr); diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index c1f8f53..5ac065b 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -814,7 +814,7 @@ int kvmppc_handle_load(struct kvm_run *run, struct kvm_vcpu *vcpu, idx = srcu_read_lock(vcpu-kvm-srcu); - ret = kvm_io_bus_read(vcpu-kvm, KVM_MMIO_BUS, run-mmio.phys_addr, + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, run-mmio.phys_addr, bytes, run-mmio.data); srcu_read_unlock(vcpu-kvm-srcu, idx); @@ -887,7 +887,7 @@ int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu, idx = srcu_read_lock(vcpu-kvm-srcu); - ret = kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, run-mmio.phys_addr, + ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, run-mmio.phys_addr, bytes, run-mmio.data); srcu_read_unlock(vcpu-kvm-srcu, idx); diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c index 9254aff..329ec75 100644 --- a/arch/s390/kvm/diag.c +++ b/arch/s390/kvm/diag.c @@ -213,7 +213,7 @@ static int __diag_virtio_hypercall(struct kvm_vcpu *vcpu) * - gpr 3 contains the virtqueue index (passed as datamatch) * - gpr 4 contains the index on the bus (optionally) */ - ret = kvm_io_bus_write_cookie(vcpu-kvm, KVM_VIRTIO_CCW_NOTIFY_BUS, + ret = kvm_io_bus_write_cookie(vcpu, KVM_VIRTIO_CCW_NOTIFY_BUS, vcpu-run-s.regs.gprs[2] 0x, 8, vcpu-run-s.regs.gprs[3], vcpu-run-s.regs.gprs[4]); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 3e556c6..e6d9f01 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5620,7 +5620,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu) gpa_t gpa; gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); - if (!kvm_io_bus_write(vcpu-kvm, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { + if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { skip_emulated_instruction(vcpu); return 1; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0033df3..bbf9375 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4053,7 +4053,7 @@ static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len, n = min(len, 8); if (!(vcpu-arch.apic !kvm_iodevice_write(vcpu-arch.apic-dev, addr, n, v)) vcpu param should be added above too. Eric - kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, addr, n, v)) + kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, n, v)) break; handled += n; addr += n; @@ -4072,8 +4072,9 @@ static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v) do { n = min(len, 8); if (!(vcpu-arch.apic - !kvm_iodevice_read(vcpu-arch.apic-dev, addr, n, v)) - kvm_io_bus_read(vcpu-kvm, KVM_MMIO_BUS, addr, n, v)) + !kvm_iodevice_read(vcpu, vcpu-arch.apic-dev, + addr, n, v)) + kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, n, v)) break; trace_kvm_mmio(KVM_TRACE_MMIO_READ, n, addr, *(u64 *)v);
Re: [PATCH 4/7] KVM: arm64: guest debug, add SW break point support
On 26 November 2014 at 16:07, Andrew Jones drjo...@redhat.com wrote: There appears to be a typo in the ARM ARM. Subsection Software Breakpoint Instruction exception of D1.10.4 says BRK (ESR_EL2_EC_BRK64) is 0x39, but the table above that has it correctly as 0x3c. Thanks for pointing out this typo -- I've reported it to the appropriate documentation people. -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v3 12/12] s390x/virtio-ccw: enable virtio 1.0
virtio-ccw should now have everything in place to operate virtio 1.0 devices, so let's enable revision 1. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/s390x/virtio-ccw.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/s390x/virtio-ccw.h b/hw/s390x/virtio-ccw.h index 03d5955..08edd8d 100644 --- a/hw/s390x/virtio-ccw.h +++ b/hw/s390x/virtio-ccw.h @@ -73,7 +73,7 @@ typedef struct VirtIOCCWDeviceClass { #define VIRTIO_CCW_FEATURE_SIZE NR_VIRTIO_FEATURE_WORDS /* The maximum virtio revision we support. */ -#define VIRTIO_CCW_REV_MAX 0 +#define VIRTIO_CCW_REV_MAX 1 /* Performance improves when virtqueue kick processing is decoupled from the * vcpu thread using ioeventfd for some devices. */ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v3 11/12] virtio-net/virtio-blk: enable virtio 1.0
virtio-net (non-vhost) and virtio-blk have everything in place to support virtio 1.0: let's enable the feature bit for them. Note that VIRTIO_F_VERSION_1 is technically a transport feature; once every device is ready for virtio 1.0, we can move this setting this feature bit out of the individual devices. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/block/virtio-blk.c |4 hw/net/virtio-net.c |4 2 files changed, 8 insertions(+) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 6d86f60..3781f98 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -569,6 +569,10 @@ static uint32_t virtio_blk_get_features(VirtIODevice *vdev, unsigned int index, { VirtIOBlock *s = VIRTIO_BLK(vdev); +if (index == 1) { +features |= (1 (VIRTIO_F_VERSION_1 - 32)); +} + if (index 0) { return features; } diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 1e214b5..fcfc95f 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -447,6 +447,10 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, unsigned int index, VirtIONet *n = VIRTIO_NET(vdev); NetClientState *nc = qemu_get_queue(n-nic); +if (index == 1 !get_vhost_net(nc-peer)) { +features |= (1 (VIRTIO_F_VERSION_1 - 32)); +} + if (index 0) { return features; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v3 10/12] s390x/virtio-ccw: support virtio-1 set_vq format
Support the new CCW_CMD_SET_VQ format for virtio-1 devices. While we're at it, refactor the code a bit and enforce big endian fields (which had always been required, even for legacy). Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/s390x/virtio-ccw.c | 114 ++--- 1 file changed, 80 insertions(+), 34 deletions(-) diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c index e79f3b8..60d67a3 100644 --- a/hw/s390x/virtio-ccw.c +++ b/hw/s390x/virtio-ccw.c @@ -238,11 +238,20 @@ VirtualCssBus *virtual_css_bus_init(void) } /* Communication blocks used by several channel commands. */ -typedef struct VqInfoBlock { +typedef struct VqInfoBlockLegacy { uint64_t queue; uint32_t align; uint16_t index; uint16_t num; +} QEMU_PACKED VqInfoBlockLegacy; + +typedef struct VqInfoBlock { +uint64_t desc; +uint32_t res0; +uint16_t index; +uint16_t num; +uint64_t avail; +uint64_t used; } QEMU_PACKED VqInfoBlock; typedef struct VqConfigBlock { @@ -269,17 +278,20 @@ typedef struct VirtioRevInfo { } QEMU_PACKED VirtioRevInfo; /* Specify where the virtqueues for the subchannel are in guest memory. */ -static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t addr, uint32_t align, - uint16_t index, uint16_t num) +static int virtio_ccw_set_vqs(SubchDev *sch, VqInfoBlock *info, + VqInfoBlockLegacy *linfo) { VirtIODevice *vdev = virtio_ccw_get_vdev(sch); +uint16_t index = info ? info-index : linfo-index; +uint16_t num = info ? info-num : linfo-num; +uint64_t desc = info ? info-desc : linfo-queue; if (index VIRTIO_PCI_QUEUE_MAX) { return -EINVAL; } /* Current code in virtio.c relies on 4K alignment. */ -if (addr (align != 4096)) { +if (linfo desc (linfo-align != 4096)) { return -EINVAL; } @@ -287,8 +299,12 @@ static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t addr, uint32_t align, return -EINVAL; } -virtio_queue_set_addr(vdev, index, addr); -if (!addr) { +if (info) { +virtio_queue_set_rings(vdev, index, desc, info-avail, info-used); +} else { +virtio_queue_set_addr(vdev, index, desc); +} +if (!desc) { virtio_queue_set_vector(vdev, index, 0); } else { /* Fail if we don't have a big enough queue. */ @@ -303,10 +319,66 @@ static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t addr, uint32_t align, return 0; } -static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw) +static int virtio_ccw_handle_set_vq(SubchDev *sch, CCW1 ccw, bool check_len, +bool is_legacy) { int ret; VqInfoBlock info; +VqInfoBlockLegacy linfo; +size_t info_len = is_legacy ? sizeof(linfo) : sizeof(info); + +if (check_len) { +if (ccw.count != info_len) { +return -EINVAL; +} +} else if (ccw.count info_len) { +/* Can't execute command. */ +return -EINVAL; +} +if (!ccw.cda) { +return -EFAULT; +} +if (is_legacy) { +linfo.queue = ldq_be_phys(address_space_memory, ccw.cda); +linfo.align = ldl_be_phys(address_space_memory, + ccw.cda + sizeof(linfo.queue)); +linfo.index = lduw_be_phys(address_space_memory, + ccw.cda + sizeof(linfo.queue) + + sizeof(linfo.align)); +linfo.num = lduw_be_phys(address_space_memory, + ccw.cda + sizeof(linfo.queue) + + sizeof(linfo.align) + + sizeof(linfo.index)); +ret = virtio_ccw_set_vqs(sch, NULL, linfo); +} else { +info.desc = ldq_be_phys(address_space_memory, ccw.cda); +info.index = lduw_be_phys(address_space_memory, + ccw.cda + sizeof(info.desc) + + sizeof(info.res0)); +info.num = lduw_be_phys(address_space_memory, +ccw.cda + sizeof(info.desc) + + sizeof(info.res0) + + sizeof(info.index)); +info.avail = ldq_be_phys(address_space_memory, + ccw.cda + sizeof(info.desc) + + sizeof(info.res0) + + sizeof(info.index) + + sizeof(info.num)); +info.used = ldq_be_phys(address_space_memory, +ccw.cda + sizeof(info.desc) ++ sizeof(info.res0) ++ sizeof(info.index) ++ sizeof(info.num) ++ sizeof(info.avail)); +ret = virtio_ccw_set_vqs(sch,
[PATCH RFC v3 07/12] dataplane: allow virtio-1 devices
Handle endianness conversion for virtio-1 virtqueues correctly. Note that dataplane now needs to be built per-target. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/block/dataplane/virtio-blk.c |4 +- hw/scsi/virtio-scsi-dataplane.c |2 +- hw/virtio/Makefile.objs |2 +- hw/virtio/dataplane/Makefile.objs |2 +- hw/virtio/dataplane/vring.c | 86 ++--- include/hw/virtio/dataplane/vring-accessors.h | 75 + include/hw/virtio/dataplane/vring.h | 14 +--- 7 files changed, 131 insertions(+), 54 deletions(-) create mode 100644 include/hw/virtio/dataplane/vring-accessors.h diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index 1222a37..2d8cc15 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -16,7 +16,9 @@ #include qemu/iov.h #include qemu/thread.h #include qemu/error-report.h +#include hw/virtio/virtio-access.h #include hw/virtio/dataplane/vring.h +#include hw/virtio/dataplane/vring-accessors.h #include sysemu/block-backend.h #include hw/virtio/virtio-blk.h #include virtio-blk.h @@ -75,7 +77,7 @@ static void complete_request_vring(VirtIOBlockReq *req, unsigned char status) VirtIOBlockDataPlane *s = req-dev-dataplane; stb_p(req-in-status, status); -vring_push(req-dev-dataplane-vring, req-elem, +vring_push(s-vdev, req-dev-dataplane-vring, req-elem, req-qiov.size + sizeof(*req-in)); /* Suppress notification to guest by BH and its scheduled diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c index 03a1e8c..418d73b 100644 --- a/hw/scsi/virtio-scsi-dataplane.c +++ b/hw/scsi/virtio-scsi-dataplane.c @@ -94,7 +94,7 @@ void virtio_scsi_vring_push_notify(VirtIOSCSIReq *req) { VirtIODevice *vdev = VIRTIO_DEVICE(req-vring-parent); -vring_push(req-vring-vring, req-elem, +vring_push(vdev, req-vring-vring, req-elem, req-qsgl.size + req-resp_iov.size); if (vring_should_notify(vdev, req-vring-vring)) { diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs index d21c397..19b224a 100644 --- a/hw/virtio/Makefile.objs +++ b/hw/virtio/Makefile.objs @@ -2,7 +2,7 @@ common-obj-y += virtio-rng.o common-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o common-obj-y += virtio-bus.o common-obj-y += virtio-mmio.o -common-obj-$(CONFIG_VIRTIO) += dataplane/ +obj-$(CONFIG_VIRTIO) += dataplane/ obj-y += virtio.o virtio-balloon.o obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o diff --git a/hw/virtio/dataplane/Makefile.objs b/hw/virtio/dataplane/Makefile.objs index 9a8cfc0..753a9ca 100644 --- a/hw/virtio/dataplane/Makefile.objs +++ b/hw/virtio/dataplane/Makefile.objs @@ -1 +1 @@ -common-obj-y += vring.o +obj-y += vring.o diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c index a051775..0da8d6b 100644 --- a/hw/virtio/dataplane/vring.c +++ b/hw/virtio/dataplane/vring.c @@ -18,7 +18,9 @@ #include hw/hw.h #include exec/memory.h #include exec/address-spaces.h +#include hw/virtio/virtio-access.h #include hw/virtio/dataplane/vring.h +#include hw/virtio/dataplane/vring-accessors.h #include qemu/error-report.h /* vring_map can be coupled with vring_unmap or (if you still have the @@ -83,7 +85,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n) vring_init(vring-vr, virtio_queue_get_num(vdev, n), vring_ptr, 4096); vring-last_avail_idx = virtio_queue_get_last_avail_idx(vdev, n); -vring-last_used_idx = vring-vr.used-idx; +vring-last_used_idx = vring_get_used_idx(vdev, vring); vring-signalled_used = 0; vring-signalled_used_valid = false; @@ -104,7 +106,7 @@ void vring_teardown(Vring *vring, VirtIODevice *vdev, int n) void vring_disable_notification(VirtIODevice *vdev, Vring *vring) { if (!(vdev-guest_features[0] (1 VIRTIO_RING_F_EVENT_IDX))) { -vring-vr.used-flags |= VRING_USED_F_NO_NOTIFY; +vring_set_used_flags(vdev, vring, VRING_USED_F_NO_NOTIFY); } } @@ -117,10 +119,10 @@ bool vring_enable_notification(VirtIODevice *vdev, Vring *vring) if (vdev-guest_features[0] (1 VIRTIO_RING_F_EVENT_IDX)) { vring_avail_event(vring-vr) = vring-vr.avail-idx; } else { -vring-vr.used-flags = ~VRING_USED_F_NO_NOTIFY; +vring_clear_used_flags(vdev, vring, VRING_USED_F_NO_NOTIFY); } smp_mb(); /* ensure update is seen before reading avail_idx */ -return !vring_more_avail(vring); +return !vring_more_avail(vdev, vring); } /* This is stolen from linux/drivers/vhost/vhost.c:vhost_notify() */ @@ -134,12 +136,13 @@ bool vring_should_notify(VirtIODevice *vdev, Vring *vring) smp_mb(); if ((vdev-guest_features[0] VIRTIO_F_NOTIFY_ON_EMPTY) -unlikely(vring-vr.avail-idx == vring-last_avail_idx)) { +unlikely(!vring_more_avail(vdev,
[PATCH RFC v3 06/12] virtio: allow virtio-1 queue layout
For virtio-1 devices, we allow a more complex queue layout that doesn't require descriptor table and rings on a physically-contigous memory area: add virtio_queue_set_rings() to allow transports to set this up. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/virtio/virtio.c | 16 include/hw/virtio/virtio.h |2 ++ 2 files changed, 18 insertions(+) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 4149f45..2c6bb91 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -96,6 +96,13 @@ static void virtqueue_init(VirtQueue *vq) { hwaddr pa = vq-pa; +if (pa == -1ULL) { +/* + * This is a virtio-1 style vq that has already been setup + * in virtio_queue_set. + */ +return; +} vq-vring.desc = pa; vq-vring.avail = pa + vq-vring.num * sizeof(VRingDesc); vq-vring.used = vring_align(vq-vring.avail + @@ -719,6 +726,15 @@ hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n) return vdev-vq[n].pa; } +void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc, +hwaddr avail, hwaddr used) +{ +vdev-vq[n].pa = -1ULL; +vdev-vq[n].vring.desc = desc; +vdev-vq[n].vring.avail = avail; +vdev-vq[n].vring.used = used; +} + void virtio_queue_set_num(VirtIODevice *vdev, int n, int num) { /* Don't allow guest to flip queue between existent and diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index 40e567c..f840320 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -227,6 +227,8 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, hwaddr addr); hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n); void virtio_queue_set_num(VirtIODevice *vdev, int n, int num); int virtio_queue_get_num(VirtIODevice *vdev, int n); +void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc, +hwaddr avail, hwaddr used); void virtio_queue_set_align(VirtIODevice *vdev, int n, int align); void virtio_queue_notify(VirtIODevice *vdev, int n); uint16_t virtio_queue_vector(VirtIODevice *vdev, int n); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v3 09/12] s390x/virtio-ccw: add virtio set-revision call
From: Thomas Huth th...@linux.vnet.ibm.com Handle the virtio-ccw revision according to what the guest sets. When revision 1 is selected, we have a virtio-1 standard device with byteswapping for the virtio rings. When a channel gets disabled, we have to revert to the legacy behavior in case the next user of the device does not negotiate the revision 1 anymore (e.g. the boot firmware uses revision 1, but the operating system only uses the legacy mode). Note that revisions 0 are still disabled; but we still extend the feature bit size to be able to handle the VERSION_1 bit. Signed-off-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/s390x/virtio-ccw.c | 52 + hw/s390x/virtio-ccw.h |7 ++- 2 files changed, 58 insertions(+), 1 deletion(-) diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c index 62ec852..e79f3b8 100644 --- a/hw/s390x/virtio-ccw.c +++ b/hw/s390x/virtio-ccw.c @@ -20,9 +20,11 @@ #include hw/virtio/virtio-net.h #include hw/sysbus.h #include qemu/bitops.h +#include hw/virtio/virtio-access.h #include hw/virtio/virtio-bus.h #include hw/s390x/adapter.h #include hw/s390x/s390_flic.h +#include linux/virtio_config.h #include ioinst.h #include css.h @@ -260,6 +262,12 @@ typedef struct VirtioThinintInfo { uint8_t isc; } QEMU_PACKED VirtioThinintInfo; +typedef struct VirtioRevInfo { +uint16_t revision; +uint16_t length; +uint8_t data[0]; +} QEMU_PACKED VirtioRevInfo; + /* Specify where the virtqueues for the subchannel are in guest memory. */ static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t addr, uint32_t align, uint16_t index, uint16_t num) @@ -299,6 +307,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw) { int ret; VqInfoBlock info; +VirtioRevInfo revinfo; uint8_t status; VirtioFeatDesc features; void *config; @@ -373,6 +382,13 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw) ccw.cda + sizeof(features.features)); if (features.index ARRAY_SIZE(dev-host_features)) { features.features = dev-host_features[features.index]; +/* + * Don't offer version 1 to the guest if it did not + * negotiate at least revision 1. + */ +if (features.index == 1 dev-revision = 0) { +features.features = ~(1 (VIRTIO_F_VERSION_1 - 32)); +} } else { /* Return zeroes if the guest supports more feature bits. */ features.features = 0; @@ -400,6 +416,13 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw) ccw.cda + sizeof(features.features)); features.features = ldl_le_phys(address_space_memory, ccw.cda); if (features.index ARRAY_SIZE(vdev-guest_features)) { +/* + * The guest should not set version 1 if it didn't + * negotiate a revision = 1. + */ +if (features.index == 1 dev-revision = 0) { +features.features = ~(1 (VIRTIO_F_VERSION_1 - 32)); +} virtio_set_features(vdev, features.index, features.features); } else { /* @@ -600,6 +623,25 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw) } } break; +case CCW_CMD_SET_VIRTIO_REV: +len = sizeof(revinfo); +if (ccw.count len || (check_len ccw.count len)) { +ret = -EINVAL; +break; +} +if (!ccw.cda) { +ret = -EFAULT; +break; +} +cpu_physical_memory_read(ccw.cda, revinfo, len); +if (dev-revision = 0 || +revinfo.revision VIRTIO_CCW_REV_MAX) { +ret = -ENOSYS; +break; +} +ret = 0; +dev-revision = revinfo.revision; +break; default: ret = -ENOSYS; break; @@ -607,6 +649,13 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw) return ret; } +static void virtio_sch_disable_cb(SubchDev *sch) +{ +VirtioCcwDevice *dev = sch-driver_data; + +dev-revision = -1; +} + static int virtio_ccw_device_init(VirtioCcwDevice *dev, VirtIODevice *vdev) { unsigned int cssid = 0; @@ -733,6 +782,7 @@ static int virtio_ccw_device_init(VirtioCcwDevice *dev, VirtIODevice *vdev) css_sch_build_virtual_schib(sch, 0, VIRTIO_CCW_CHPID_TYPE); sch-ccw_cb = virtio_ccw_cb; +sch-disable_cb = virtio_sch_disable_cb; /* Build senseid data. */ memset(sch-id, 0, sizeof(SenseId)); @@ -740,6 +790,8 @@ static int virtio_ccw_device_init(VirtioCcwDevice *dev, VirtIODevice *vdev) sch-id.cu_type = VIRTIO_CCW_CU_TYPE; sch-id.cu_model = vdev-device_id; +dev-revision =
[PATCH RFC v3 02/12] virtio: cull virtio_bus_set_vdev_features
The only user of this function was virtio-ccw, and it should use virtio_set_features() like everybody else: We need to make sure that bad features are masked out properly, which this function did not do. Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/s390x/virtio-ccw.c |3 +-- hw/virtio/virtio-bus.c | 14 -- include/hw/virtio/virtio-bus.h |3 --- 3 files changed, 1 insertion(+), 19 deletions(-) diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c index ea236c9..84f17bc 100644 --- a/hw/s390x/virtio-ccw.c +++ b/hw/s390x/virtio-ccw.c @@ -400,8 +400,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw) ccw.cda + sizeof(features.features)); features.features = ldl_le_phys(address_space_memory, ccw.cda); if (features.index ARRAY_SIZE(dev-host_features)) { -virtio_bus_set_vdev_features(dev-bus, features.features); -vdev-guest_features = features.features; +virtio_set_features(vdev, features.features); } else { /* * If the guest supports more feature bits, assert that it diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c index eb77019..a8ffa07 100644 --- a/hw/virtio/virtio-bus.c +++ b/hw/virtio/virtio-bus.c @@ -109,20 +109,6 @@ uint32_t virtio_bus_get_vdev_features(VirtioBusState *bus, return k-get_features(vdev, requested_features); } -/* Set the features of the plugged device. */ -void virtio_bus_set_vdev_features(VirtioBusState *bus, - uint32_t requested_features) -{ -VirtIODevice *vdev = virtio_bus_get_device(bus); -VirtioDeviceClass *k; - -assert(vdev != NULL); -k = VIRTIO_DEVICE_GET_CLASS(vdev); -if (k-set_features != NULL) { -k-set_features(vdev, requested_features); -} -} - /* Get bad features of the plugged device. */ uint32_t virtio_bus_get_vdev_bad_features(VirtioBusState *bus) { diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h index 0756545..0d2e7b4 100644 --- a/include/hw/virtio/virtio-bus.h +++ b/include/hw/virtio/virtio-bus.h @@ -84,9 +84,6 @@ size_t virtio_bus_get_vdev_config_len(VirtioBusState *bus); /* Get the features of the plugged device. */ uint32_t virtio_bus_get_vdev_features(VirtioBusState *bus, uint32_t requested_features); -/* Set the features of the plugged device. */ -void virtio_bus_set_vdev_features(VirtioBusState *bus, - uint32_t requested_features); /* Get bad features of the plugged device. */ uint32_t virtio_bus_get_vdev_bad_features(VirtioBusState *bus); /* Get config of the plugged device. */ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v3 08/12] s390x/css: Add a callback for when subchannel gets disabled
From: Thomas Huth th...@linux.vnet.ibm.com We need a possibility to run code when a subchannel gets disabled. This patch adds the necessary infrastructure. Signed-off-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/s390x/css.c | 12 hw/s390x/css.h |1 + 2 files changed, 13 insertions(+) diff --git a/hw/s390x/css.c b/hw/s390x/css.c index b67c039..735ec55 100644 --- a/hw/s390x/css.c +++ b/hw/s390x/css.c @@ -588,6 +588,7 @@ int css_do_msch(SubchDev *sch, SCHIB *orig_schib) { SCSW *s = sch-curr_status.scsw; PMCW *p = sch-curr_status.pmcw; +uint16_t oldflags; int ret; SCHIB schib; @@ -610,6 +611,7 @@ int css_do_msch(SubchDev *sch, SCHIB *orig_schib) copy_schib_from_guest(schib, orig_schib); /* Only update the program-modifiable fields. */ p-intparm = schib.pmcw.intparm; +oldflags = p-flags; p-flags = ~(PMCW_FLAGS_MASK_ISC | PMCW_FLAGS_MASK_ENA | PMCW_FLAGS_MASK_LM | PMCW_FLAGS_MASK_MME | PMCW_FLAGS_MASK_MP); @@ -625,6 +627,12 @@ int css_do_msch(SubchDev *sch, SCHIB *orig_schib) (PMCW_CHARS_MASK_MBFC | PMCW_CHARS_MASK_CSENSE); sch-curr_status.mba = schib.mba; +/* Has the channel been disabled? */ +if (sch-disable_cb (oldflags PMCW_FLAGS_MASK_ENA) != 0 + (p-flags PMCW_FLAGS_MASK_ENA) == 0) { +sch-disable_cb(sch); +} + ret = 0; out: @@ -1443,6 +1451,10 @@ void css_reset_sch(SubchDev *sch) { PMCW *p = sch-curr_status.pmcw; +if ((p-flags PMCW_FLAGS_MASK_ENA) != 0 sch-disable_cb) { +sch-disable_cb(sch); +} + p-intparm = 0; p-flags = ~(PMCW_FLAGS_MASK_ISC | PMCW_FLAGS_MASK_ENA | PMCW_FLAGS_MASK_LM | PMCW_FLAGS_MASK_MME | diff --git a/hw/s390x/css.h b/hw/s390x/css.h index 33104ac..7fa807b 100644 --- a/hw/s390x/css.h +++ b/hw/s390x/css.h @@ -81,6 +81,7 @@ struct SubchDev { uint8_t ccw_no_data_cnt; /* transport-provided data: */ int (*ccw_cb) (SubchDev *, CCW1); +void (*disable_cb)(SubchDev *); SenseId id; void *driver_data; }; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v3 05/12] virtio: introduce legacy virtio devices
Introduce a helper function to indicate whether a virtio device is operating in legacy or virtio standard mode. It may be used to make decisions about the endianess of virtio accesses and other virtio-1 specific changes, enabling us to support transitional devices. Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/virtio/virtio.c|6 +- include/hw/virtio/virtio-access.h |4 include/hw/virtio/virtio.h| 13 +++-- 3 files changed, 20 insertions(+), 3 deletions(-) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 2eb5d3c..4149f45 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -883,7 +883,11 @@ static bool virtio_device_endian_needed(void *opaque) VirtIODevice *vdev = opaque; assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN); -return vdev-device_endian != virtio_default_endian(); +if (virtio_device_is_legacy(vdev)) { +return vdev-device_endian != virtio_default_endian(); +} +/* Devices conforming to VIRTIO 1.0 or later are always LE. */ +return vdev-device_endian != VIRTIO_DEVICE_ENDIAN_LITTLE; } static const VMStateDescription vmstate_virtio_device_endian = { diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h index 46456fd..c123ee0 100644 --- a/include/hw/virtio/virtio-access.h +++ b/include/hw/virtio/virtio-access.h @@ -19,6 +19,10 @@ static inline bool virtio_access_is_big_endian(VirtIODevice *vdev) { +if (!virtio_device_is_legacy(vdev)) { +/* Devices conforming to VIRTIO 1.0 or later are always LE. */ +return false; +} #if defined(TARGET_IS_BIENDIAN) return virtio_is_big_endian(vdev); #elif defined(TARGET_WORDS_BIGENDIAN) diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index b408166..40e567c 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -275,9 +275,18 @@ void virtio_queue_set_host_notifier_fd_handler(VirtQueue *vq, bool assign, void virtio_queue_notify_vq(VirtQueue *vq); void virtio_irq(VirtQueue *vq); +static inline bool virtio_device_is_legacy(VirtIODevice *vdev) +{ +return !(vdev-guest_features[1] (1 (VIRTIO_F_VERSION_1 - 32))); +} + static inline bool virtio_is_big_endian(VirtIODevice *vdev) { -assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN); -return vdev-device_endian == VIRTIO_DEVICE_ENDIAN_BIG; +if (virtio_device_is_legacy(vdev)) { +assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN); +return vdev-device_endian == VIRTIO_DEVICE_ENDIAN_BIG; +} +/* Devices conforming to VIRTIO 1.0 or later are always LE. */ +return false; } #endif -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v3 04/12] s390x/virtio-ccw: fix check for WRITE_FEAT
We need to check guest feature size, not host feature size to find out whether we should call virtio_set_features(). This check is possible now that vdev-guest_features is an array. Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/s390x/virtio-ccw.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c index 2f52b82..62ec852 100644 --- a/hw/s390x/virtio-ccw.c +++ b/hw/s390x/virtio-ccw.c @@ -399,7 +399,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw) features.index = ldub_phys(address_space_memory, ccw.cda + sizeof(features.features)); features.features = ldl_le_phys(address_space_memory, ccw.cda); -if (features.index ARRAY_SIZE(dev-host_features)) { +if (features.index ARRAY_SIZE(vdev-guest_features)) { virtio_set_features(vdev, features.index, features.features); } else { /* -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v3 00/12] qemu: towards virtio-1 host support
Next version of virtio-1 patches for qemu. Only change from v2 is splitting out the vring accessors into a separate header file - should hopefully fix the build issues. Cornelia Huck (9): virtio: cull virtio_bus_set_vdev_features virtio: support more feature bits s390x/virtio-ccw: fix check for WRITE_FEAT virtio: introduce legacy virtio devices virtio: allow virtio-1 queue layout dataplane: allow virtio-1 devices s390x/virtio-ccw: support virtio-1 set_vq format virtio-net/virtio-blk: enable virtio 1.0 s390x/virtio-ccw: enable virtio 1.0 Thomas Huth (3): linux-headers/virtio_config: Update with VIRTIO_F_VERSION_1 s390x/css: Add a callback for when subchannel gets disabled s390x/virtio-ccw: add virtio set-revision call hw/9pfs/virtio-9p-device.c|7 +- hw/block/dataplane/virtio-blk.c |4 +- hw/block/virtio-blk.c | 13 +- hw/char/virtio-serial-bus.c |9 +- hw/net/virtio-net.c | 42 -- hw/s390x/css.c| 12 ++ hw/s390x/css.h|1 + hw/s390x/s390-virtio-bus.c|9 +- hw/s390x/virtio-ccw.c | 186 +++-- hw/s390x/virtio-ccw.h |7 +- hw/scsi/vhost-scsi.c |7 +- hw/scsi/virtio-scsi-dataplane.c |2 +- hw/scsi/virtio-scsi.c | 10 +- hw/virtio/Makefile.objs |2 +- hw/virtio/dataplane/Makefile.objs |2 +- hw/virtio/dataplane/vring.c | 96 +++-- hw/virtio/virtio-balloon.c|8 +- hw/virtio/virtio-bus.c| 23 +-- hw/virtio/virtio-mmio.c |9 +- hw/virtio/virtio-pci.c| 13 +- hw/virtio/virtio-rng.c|2 +- hw/virtio/virtio.c| 51 +-- include/hw/virtio/dataplane/vring-accessors.h | 75 ++ include/hw/virtio/dataplane/vring.h | 14 +- include/hw/virtio/virtio-access.h |4 + include/hw/virtio/virtio-bus.h| 10 +- include/hw/virtio/virtio.h| 34 - linux-headers/linux/virtio_config.h |3 + 28 files changed, 467 insertions(+), 188 deletions(-) create mode 100644 include/hw/virtio/dataplane/vring-accessors.h -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v3 01/12] linux-headers/virtio_config: Update with VIRTIO_F_VERSION_1
From: Thomas Huth th...@linux.vnet.ibm.com Add the new VIRTIO_F_VERSION_1 definition to the virtio_config.h linux header. Signed-off-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- linux-headers/linux/virtio_config.h |3 +++ 1 file changed, 3 insertions(+) diff --git a/linux-headers/linux/virtio_config.h b/linux-headers/linux/virtio_config.h index 75dc20b..16aa289 100644 --- a/linux-headers/linux/virtio_config.h +++ b/linux-headers/linux/virtio_config.h @@ -54,4 +54,7 @@ /* Can the device handle any descriptor layout? */ #define VIRTIO_F_ANY_LAYOUT27 +/* v1.0 compliant. */ +#define VIRTIO_F_VERSION_1 32 + #endif /* _LINUX_VIRTIO_CONFIG_H */ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v3 03/12] virtio: support more feature bits
With virtio-1, we support more than 32 feature bits. Let's make vdev-guest_features depend on the number of supported feature bits, allowing us to grow the feature bits automatically. We also need to enhance the internal functions dealing with getting and setting features with an additional index field, so that all feature bits may be accessed (in chunks of 32 bits). vhost and migration have been ignored for now. Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/9pfs/virtio-9p-device.c |7 ++- hw/block/virtio-blk.c |9 +++-- hw/char/virtio-serial-bus.c|9 +++-- hw/net/virtio-net.c| 38 ++ hw/s390x/s390-virtio-bus.c |9 + hw/s390x/virtio-ccw.c | 17 ++--- hw/scsi/vhost-scsi.c |7 +-- hw/scsi/virtio-scsi.c | 10 +- hw/virtio/dataplane/vring.c| 10 +- hw/virtio/virtio-balloon.c |8 ++-- hw/virtio/virtio-bus.c |9 + hw/virtio/virtio-mmio.c|9 + hw/virtio/virtio-pci.c | 13 +++-- hw/virtio/virtio-rng.c |2 +- hw/virtio/virtio.c | 29 + include/hw/virtio/virtio-bus.h |7 --- include/hw/virtio/virtio.h | 19 ++- 17 files changed, 135 insertions(+), 77 deletions(-) diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index 2572747..c29c8c8 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -21,8 +21,13 @@ #include virtio-9p-coth.h #include hw/virtio/virtio-access.h -static uint32_t virtio_9p_get_features(VirtIODevice *vdev, uint32_t features) +static uint32_t virtio_9p_get_features(VirtIODevice *vdev, unsigned int index, + uint32_t features) { +if (index 0) { +return features; +} + features |= 1 VIRTIO_9P_MOUNT_TAG; return features; } diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index b19b102..6d86f60 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -564,10 +564,15 @@ static void virtio_blk_set_config(VirtIODevice *vdev, const uint8_t *config) aio_context_release(blk_get_aio_context(s-blk)); } -static uint32_t virtio_blk_get_features(VirtIODevice *vdev, uint32_t features) +static uint32_t virtio_blk_get_features(VirtIODevice *vdev, unsigned int index, +uint32_t features) { VirtIOBlock *s = VIRTIO_BLK(vdev); +if (index 0) { +return features; +} + features |= (1 VIRTIO_BLK_F_SEG_MAX); features |= (1 VIRTIO_BLK_F_GEOMETRY); features |= (1 VIRTIO_BLK_F_TOPOLOGY); @@ -601,7 +606,7 @@ static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t status) return; } -features = vdev-guest_features; +features = vdev-guest_features[0]; /* A guest that supports VIRTIO_BLK_F_CONFIG_WCE must be able to send * cache flushes. Thus, the auto writethrough behavior is never diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index a7b1b68..55de504 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -75,7 +75,7 @@ static VirtIOSerialPort *find_port_by_name(char *name) static bool use_multiport(VirtIOSerial *vser) { VirtIODevice *vdev = VIRTIO_DEVICE(vser); -return vdev-guest_features (1 VIRTIO_CONSOLE_F_MULTIPORT); +return vdev-guest_features[0] (1 VIRTIO_CONSOLE_F_MULTIPORT); } static size_t write_to_port(VirtIOSerialPort *port, @@ -467,10 +467,15 @@ static void handle_input(VirtIODevice *vdev, VirtQueue *vq) { } -static uint32_t get_features(VirtIODevice *vdev, uint32_t features) +static uint32_t get_features(VirtIODevice *vdev, unsigned int index, + uint32_t features) { VirtIOSerial *vser; +if (index 0) { +return features; +} + vser = VIRTIO_SERIAL(vdev); if (vser-bus.max_nr_ports 1) { diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 9b88775..1e214b5 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -86,7 +86,7 @@ static void virtio_net_set_config(VirtIODevice *vdev, const uint8_t *config) memcpy(netcfg, config, n-config_size); -if (!(vdev-guest_features VIRTIO_NET_F_CTRL_MAC_ADDR 1) +if (!(vdev-guest_features[0] VIRTIO_NET_F_CTRL_MAC_ADDR 1) memcmp(netcfg.mac, n-mac, ETH_ALEN)) { memcpy(n-mac, netcfg.mac, ETH_ALEN); qemu_format_nic_info_str(qemu_get_queue(n-nic), n-mac); @@ -305,7 +305,7 @@ static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc) info-multicast_table = str_list; info-vlan_table = get_vlan_table(n); -if (!((1 VIRTIO_NET_F_CTRL_VLAN) vdev-guest_features)) { +if (!((1 VIRTIO_NET_F_CTRL_VLAN)
Re: [CFT PATCH v2 2/2] KVM: x86: support XSAVES usage in the host
2014-11-26 17:26+0100, Paolo Bonzini: On 26/11/2014 15:42, Radim Krčmář wrote: I'm not sure what is more future proof. :) I wonder if native_xrstor could be a problem the day XRSTORS actually sets/restores MSRs as the processor documentation promises. XRSTORS won't affect the guest in any way, we are just going to use it to convert the xsave, so any side-effects are going to stay in the host. (This could break the host though.) Yes, that's the problem. :) (It would be a bug in Linux's xsave API, if we were using it correctly.) My main presumption is that XSAVE*-XRSTOR*-XSAVE-XRSTOR has the same result as XSAVE-XRSTOR, because we are only interested in the state, not in any metadata. (If it isn't possible to combine intructions, like XSAVE after XRSTORS, this solution won't work.) Yes, that should be right. But actually what KVM would do it is XRSTOR-XSAVE*-XRSTOR*-XSAVE. The problem here is the side effects of doing XRSTORS far from a guest entry... The entry can be aborted after doing XRSTORS, so we are going to know if this doesn't work :) though that would likely be handled by load_guest_fpu/put_guest_fpu. Yes, I don't see a principal difference between manipulating xsave for vmentry and this conversion, it can be wrapped in the same way. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] KVM: arm64: guest debug, HW assisted debug support
On Tue, Nov 25, 2014 at 04:10:05PM +, Alex Bennée wrote: This adds support for userspace to control the HW debug registers for guest debug. We'll only copy the $ARCH defined number across as that's all that hyp.S will use anyway. I've moved some helper functions into the hw_breakpoint.h header for re-use. As with single step we need to tweak the guest registers to enable the exceptions but we don't want to overwrite the guest copy of these registers so this is done close to the guest entry. Two new capabilities have been added to the KVM_EXTENSION ioctl to allow userspace to query the number of hardware break and watch points available on the host hardware. Signed-off-by: Alex Bennée alex.ben...@linaro.org diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 9383359..5e8c673 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2593,7 +2593,7 @@ The top 16 bits of the control field are architecture specific control flags which can include the following: - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64] - - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390] + - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390, arm64] - KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86] - KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86] - KVM_GUESTDBG_EXIT_PENDING: trigger an immediate guest exit [s390] @@ -2606,7 +2606,10 @@ we need to ensure the guest vCPUs architecture specific registers are updated to the correct (supplied) values. The second part of the structure is architecture specific and -typically contains a set of debug registers. +typically contains a set of debug registers. For arm64 the number of +debug registers is implementation defined and can be determined by +querying the KVM_CAP_GUEST_DEBUG_HW_BPS and KVM_CAP_GUEST_DEBUG_HW_WPS +capabilities. When debug events exit the main run loop with the reason KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index a76daae..c8ec23a 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -39,6 +39,7 @@ #include asm/cacheflush.h #include asm/virt.h #include asm/debug-monitors.h +#include asm/hw_breakpoint.h #include asm/kvm_arm.h #include asm/kvm_asm.h #include asm/kvm_mmu.h @@ -341,8 +342,37 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, /* Hardware assisted Break and Watch points */ if (vcpu-guest_debug KVM_GUESTDBG_USE_HW_BP) { - kvm_info(HW BP support requested, not yet implemented\n); - return -EINVAL; + int i; + int nb = get_num_brps(); + int nw = get_num_wrps(); + + /* Copy across up to IMPDEF debug registers to our + * shadow copy in the vcpu structure. The hyp.S code + * will then set them up before we re-enter the guest. + */ + memcpy(vcpu-arch.guest_debug_regs.dbg_bcr, + dbg-arch.dbg_bcr, sizeof(__u64)*nb); + memcpy(vcpu-arch.guest_debug_regs.dbg_bvr, + dbg-arch.dbg_bvr, sizeof(__u64)*nb); + memcpy(vcpu-arch.guest_debug_regs.dbg_wcr, + dbg-arch.dbg_wcr, sizeof(__u64)*nw); + memcpy(vcpu-arch.guest_debug_regs.dbg_wvr, + dbg-arch.dbg_wvr, sizeof(__u64)*nw); + + kvm_info(HW BP support requested\n); + for (i = 0; i nb; i++) { + kvm_info(%d: dbg_bcr=0x%llx dbg_bvr=0x%llx\n, + i, + vcpu-arch.guest_debug_regs.dbg_bcr[i], + vcpu-arch.guest_debug_regs.dbg_bvr[i]); + } + for (i = 0; i nw; i++) { + kvm_info(%d: dbg_wcr=0x%llx dbg_wvr=0x%llx\n, + i, + vcpu-arch.guest_debug_regs.dbg_wcr[i], + vcpu-arch.guest_debug_regs.dbg_wvr[i]); + } I think we decided to drop the kvm_info's. Here's several more lines of logging to drop too. + route_el2 = true; } /* If we are going to handle any debug exceptions we need to diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h index 52b484b..c450552 100644 --- a/arch/arm64/include/asm/hw_breakpoint.h +++ b/arch/arm64/include/asm/hw_breakpoint.h @@ -130,6 +130,18 @@ static inline void ptrace_hw_copy_thread(struct task_struct *task) } #endif +/* Determine number of BRP registers available. */ +static inline int get_num_brps(void) +{ + return ((read_cpuid(ID_AA64DFR0_EL1) 12) 0xf) + 1; +} + +/* Determine number of WRP registers available. */ +static inline int
Re: [PATCH 2/7] KVM: arm: guest debug, define API headers
On Wed, Nov 26, 2014 at 05:46:05PM +0100, Paolo Bonzini wrote: On 26/11/2014 15:58, Alex Bennée wrote: Andrew Jones drjo...@redhat.com writes: On Tue, Nov 25, 2014 at 04:10:00PM +, Alex Bennée wrote: This commit defines the API headers for guest debugging. There are two architecture specific debug structures: snip +/* Architecture related debug defines - upper 16 bits of + * kvm_guest_debug-control + */ +#define KVM_GUESTDBG_USE_SW_BP_SHIFT 16 +#define KVM_GUESTDBG_USE_SW_BP (1 KVM_GUESTDBG_USE_SW_BP_SHIFT) +#define KVM_GUESTDBG_USE_HW_BP_SHIFT 17 +#define KVM_GUESTDBG_USE_HW_BP (1 KVM_GUESTDBG_USE_HW_BP_SHIFT) + I see this are defined in arch/x86/include/uapi/asm/kvm.h, so you needed to reproduce them here, but shouldn't they be promoted to include/uapi/linux/kvm.h instead? Well if we move them to common uapi we either restrict the $ARCH specific options that don't have SW/HW BKPTS (would be weird but...) or make them generic in the lower 16 bits (breaks API). But in principle I have no objection if other don't. I think it's a matter of personal taste. Architecture-specific means not all architectures may support it, but it's certainly a good idea to reuse the same value if multiple architectures do support a #define. What you did is fine, another possibility is to do #define __KVM_GUESTDBG_USE_SW_BP (1 16) in include/uapi/linux/kvm.h, and #define KVM_GUESTDBG_USE_SW_BP __KVM_GUESTDBG_USE_SW_BP in the arch-specific file. Andrew, is this closer to what you intended? I just reread Documentation/virtual/kvm/api.txt a bit more closely and see that these fall in the architecture specific control region of the field. So forget what I said. But your suggestion of __KVM_GUESTDBG_USE_SW_BP looks like a good idea to me. drew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/21] KVM: x86: Software disabled APIC should still deliver NMIs
On 26/11/2014 18:01, Nadav Amit wrote: Paolo Bonzini pbonz...@redhat.com wrote: On 06/11/2014 17:45, Radim Krčmář wrote: 2014-11-06 10:34+0100, Paolo Bonzini: On 05/11/2014 21:45, Nadav Amit wrote: If I understand the SDM correctly, in such scenario (all APICs are software disabled) the mode is left as the default - flat mode (see APIC doesn't have any global mode (it is just KVM's simplification), so when a message lands on the system bus, it just compares MDA with LDR and DFR ... section 10.6.2.2 Logical Destination Mode”): All processors that have their APIC software enabled (using the spurious vector enable/disable bit) must have their DFRs (Destination Format Registers) programmed identically. The default mode for DFR is flat mode.” I think the default mode points to reset state, which is flat DFR; and it is mentioned only because of the following sentence If you are using cluster mode, DFRs must be programmed before the APIC is software enabled. That's not what either Bochs or QEMU do, though. (Though in the case of Bochs I cannot find the place where reception of IPIs is prevented for software-disabled APICs, so I'm not sure how much to trust it in this case). I'm not sure why software-disabled APICs could have different DFRs, if the APICs can receive NMI IPIs. I'll ask Intel. When changing the mode, we can't switch DFR synchronously, so it has to happen and NMI may be needed (watchdog?) before APIC configuration. Explicit statement might have been a hint to be _extra_ careful when using logical destination for INIT, NMI, ... I wonder what they'll say. Anyway, Paolo's patch seems to be in the right direction, I'd modify it a bit though: LDR=0 isn't addressable in any logical mode, so we can ignore APICs that don't set it and decide the mode by the last nonzero one. This works in a situation, where one part is configured for cluster and the rest is still in reset state. (It gets harder if we allow nonzero LDRs with different DFR ... we'd need to split our logical map to handle it.) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 758f838..6da303e1 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -148,10 +148,6 @@ static void recalculate_apic_map(struct kvm *kvm) goto out; new-ldr_bits = 8; - /* flat mode is default */ - new-cid_shift = 8; - new-cid_mask = 0; - new-lid_mask = 0xff; new-broadcast = APIC_BROADCAST; kvm_for_each_vcpu(i, vcpu, kvm) { @@ -166,7 +162,7 @@ static void recalculate_apic_map(struct kvm *kvm) new-cid_mask = (1 KVM_X2APIC_CID_BITS) - 1; new-lid_mask = 0x; new-broadcast = X2APIC_BROADCAST; - } else if (kvm_apic_hw_enabled(apic)) { + } else if (kvm_apic_get_reg(apic, APIC_LDR)) { if (kvm_apic_get_reg(apic, APIC_DFR) == APIC_DFR_CLUSTER) { new-cid_shift = 4; I merged this patch and Nadav’s. Sorry for the late and long reply, but I got an issue with the new version (and my previous version as well). Indeed, the SDM states that DFR should be the same for enabled CPUs, and that the BIOS should get all CPUs in either xAPIC or x2APIC. Yet, there is nothing that says all CPUs need to be in xAPIC/x2APIC mode. In my tests (which pass on bare-metal), I got a scenario in which some CPUs are in xAPIC mode, the BSP changed (which is currently not handled correctly by KVM) and the BSP has x2APIC enabled. All the core APICs are software-enabled. The expected behaviour is that the CPUs with x2APIC enabled would work in x2APIC mode. I think such a transitory scenario is possible on real-systems as well, perhaps during CPU hot-plug. It appears the previous version (before all of our changes) handled it better. I presume the most efficient way is to start determining the APIC logical mode from the BSP, and if it is disabled, traverse the rest of the CPUs until finding the first one with APIC enabled. Yet, I have not finished doing and checking the BSP fix and other dependent INIT signal handling fixes. In the meanwhile, would you be ok with restoring some of the previous behaviour - i.e., x2APIC is enabled if any CPU turned it on (regardless to whether APIC is software enabled), otherwise use the configuration of the last enabled APIC? -- 8 — Subject: [PATCH] KVM: x86: Traverse all CPUs during recalculate_apic_map --- arch/x86/kvm/lapic.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 9c90d31..6dc2be6 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -139,6 +139,7 @@ static void recalculate_apic_map(struct kvm *kvm) struct kvm_apic_map *new, *old = NULL; struct kvm_vcpu *vcpu; int i; + bool any_enabled
Re: [PATCH 5/7] KVM: arm64: guest debug, add support for single-step
Andrew Jones drjo...@redhat.com writes: On Tue, Nov 25, 2014 at 04:10:03PM +, Alex Bennée wrote: This adds support for single-stepping the guest. As userspace can and will manipulate guest registers before restarting any tweaking of the registers has to occur just before control is passed back to the guest. Furthermore while guest debugging is in effect we need to squash the ability of the guest to single-step itself as we have no easy way of re-entering the guest after the exception has been delivered to the hypervisor. Signed-off-by: Alex Bennée alex.ben...@linaro.org diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 48d26bb..a76daae 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -38,6 +38,7 @@ #include asm/tlbflush.h #include asm/cacheflush.h #include asm/virt.h +#include asm/debug-monitors.h #include asm/kvm_arm.h #include asm/kvm_asm.h #include asm/kvm_mmu.h @@ -300,6 +301,17 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) kvm_arm_set_running_vcpu(NULL); } +/** + * kvm_arch_vcpu_ioctl_set_guest_debug - Setup guest debugging + * @kvm:pointer to the KVM struct + * @kvm_guest_debug: the ioctl data buffer + * + * This sets up the VM for guest debugging. Care has to be taken when + * manipulating guest registers as these will be set/cleared by the + * hyper-visor controller, typically before each kvm_run event. As a + * result modification of the guest registers needs to take place + * after they have been restored in the hyp.S trampoline code. + */ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) { @@ -317,8 +329,8 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, /* Single Step */ if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) { -kvm_info(SS requested, not yet implemented\n); -return -EINVAL; +kvm_info(SS requested\n); +route_el2 = true; } /* Software Break Points */ diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index 8da1043..78e5ae1 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -121,6 +121,7 @@ int main(void) DEFINE(VCPU_FAR_EL2, offsetof(struct kvm_vcpu, arch.fault.far_el2)); DEFINE(VCPU_HPFAR_EL2,offsetof(struct kvm_vcpu, arch.fault.hpfar_el2)); DEFINE(VCPU_DEBUG_FLAGS, offsetof(struct kvm_vcpu, arch.debug_flags)); + DEFINE(GUEST_DEBUG, offsetof(struct kvm_vcpu, guest_debug)); DEFINE(VCPU_HCR_EL2, offsetof(struct kvm_vcpu, arch.hcr_el2)); DEFINE(VCPU_MDCR_EL2, offsetof(struct kvm_vcpu, arch.mdcr_el2)); DEFINE(VCPU_IRQ_LINES,offsetof(struct kvm_vcpu, arch.irq_lines)); diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c index 28dc92b..6def054 100644 --- a/arch/arm64/kvm/handle_exit.c +++ b/arch/arm64/kvm/handle_exit.c @@ -91,6 +91,25 @@ static int kvm_handle_bkpt(struct kvm_vcpu *vcpu, struct kvm_run *run) return 0; } +/** + * kvm_handle_ss - handle single step exceptions + * + * @vcpu: the vcpu pointer same @run comment as other handler header in previous patch Yeah I think I'll be merging them all together given the comments about passing syndrome info directly. + * + * See: ARM ARM D2.12 for the details. While the host is routing debug + * exceptions to it's handlers we have to suppress the ability of the + * guest to trigger exceptions. + */ +static int kvm_handle_ss(struct kvm_vcpu *vcpu, struct kvm_run *run) +{ +WARN_ON(!(vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP)); I'm not sure about this WARN_ON. Is there some scenario you were thinking of when you put it here? Is there some scenario where this could trigger so frequently we kill the log buffer? The main one I had in mind was not suppressing the guest's attempt to step while guest debugging was running. snip -/* for KVM_SET_GUEST_DEBUG */ - -#define KVM_GUESTDBG_ENABLE 0x0001 -#define KVM_GUESTDBG_SINGLESTEP 0x0002 - struct kvm_guest_debug { __u32 control; __u32 pad; @@ -1189,4 +1186,15 @@ struct kvm_assigned_msix_entry { __u16 padding[3]; }; +#endif /* __ASSEMBLY__ */ + +/* for KVM_SET_GUEST_DEBUG */ + +#define KVM_GUESTDBG_ENABLE_SHIFT 0 +#define KVM_GUESTDBG_ENABLE (1 KVM_GUESTDBG_ENABLE_SHIFT) +#define KVM_GUESTDBG_SINGLESTEP_SHIFT 1 +#define KVM_GUESTDBG_SINGLESTEP (1 KVM_GUESTDBG_SINGLESTEP_SHIFT) EALIGN: we can tab these defines up better Sure, I'll clean those up. -- Alex Bennée -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC v3 07/12] dataplane: allow virtio-1 devices
On Wed, 26 Nov 2014 18:28:38 +0100 Cornelia Huck cornelia.h...@de.ibm.com wrote: Handle endianness conversion for virtio-1 virtqueues correctly. Note that dataplane now needs to be built per-target. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com Build is ok now. Acked-by: Greg Kurz gk...@linux.vnet.ibm.com --- hw/block/dataplane/virtio-blk.c |4 +- hw/scsi/virtio-scsi-dataplane.c |2 +- hw/virtio/Makefile.objs |2 +- hw/virtio/dataplane/Makefile.objs |2 +- hw/virtio/dataplane/vring.c | 86 ++--- include/hw/virtio/dataplane/vring-accessors.h | 75 + include/hw/virtio/dataplane/vring.h | 14 +--- 7 files changed, 131 insertions(+), 54 deletions(-) create mode 100644 include/hw/virtio/dataplane/vring-accessors.h diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index 1222a37..2d8cc15 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -16,7 +16,9 @@ #include qemu/iov.h #include qemu/thread.h #include qemu/error-report.h +#include hw/virtio/virtio-access.h #include hw/virtio/dataplane/vring.h +#include hw/virtio/dataplane/vring-accessors.h #include sysemu/block-backend.h #include hw/virtio/virtio-blk.h #include virtio-blk.h @@ -75,7 +77,7 @@ static void complete_request_vring(VirtIOBlockReq *req, unsigned char status) VirtIOBlockDataPlane *s = req-dev-dataplane; stb_p(req-in-status, status); -vring_push(req-dev-dataplane-vring, req-elem, +vring_push(s-vdev, req-dev-dataplane-vring, req-elem, req-qiov.size + sizeof(*req-in)); /* Suppress notification to guest by BH and its scheduled diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c index 03a1e8c..418d73b 100644 --- a/hw/scsi/virtio-scsi-dataplane.c +++ b/hw/scsi/virtio-scsi-dataplane.c @@ -94,7 +94,7 @@ void virtio_scsi_vring_push_notify(VirtIOSCSIReq *req) { VirtIODevice *vdev = VIRTIO_DEVICE(req-vring-parent); -vring_push(req-vring-vring, req-elem, +vring_push(vdev, req-vring-vring, req-elem, req-qsgl.size + req-resp_iov.size); if (vring_should_notify(vdev, req-vring-vring)) { diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs index d21c397..19b224a 100644 --- a/hw/virtio/Makefile.objs +++ b/hw/virtio/Makefile.objs @@ -2,7 +2,7 @@ common-obj-y += virtio-rng.o common-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o common-obj-y += virtio-bus.o common-obj-y += virtio-mmio.o -common-obj-$(CONFIG_VIRTIO) += dataplane/ +obj-$(CONFIG_VIRTIO) += dataplane/ obj-y += virtio.o virtio-balloon.o obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o diff --git a/hw/virtio/dataplane/Makefile.objs b/hw/virtio/dataplane/Makefile.objs index 9a8cfc0..753a9ca 100644 --- a/hw/virtio/dataplane/Makefile.objs +++ b/hw/virtio/dataplane/Makefile.objs @@ -1 +1 @@ -common-obj-y += vring.o +obj-y += vring.o diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c index a051775..0da8d6b 100644 --- a/hw/virtio/dataplane/vring.c +++ b/hw/virtio/dataplane/vring.c @@ -18,7 +18,9 @@ #include hw/hw.h #include exec/memory.h #include exec/address-spaces.h +#include hw/virtio/virtio-access.h #include hw/virtio/dataplane/vring.h +#include hw/virtio/dataplane/vring-accessors.h #include qemu/error-report.h /* vring_map can be coupled with vring_unmap or (if you still have the @@ -83,7 +85,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n) vring_init(vring-vr, virtio_queue_get_num(vdev, n), vring_ptr, 4096); vring-last_avail_idx = virtio_queue_get_last_avail_idx(vdev, n); -vring-last_used_idx = vring-vr.used-idx; +vring-last_used_idx = vring_get_used_idx(vdev, vring); vring-signalled_used = 0; vring-signalled_used_valid = false; @@ -104,7 +106,7 @@ void vring_teardown(Vring *vring, VirtIODevice *vdev, int n) void vring_disable_notification(VirtIODevice *vdev, Vring *vring) { if (!(vdev-guest_features[0] (1 VIRTIO_RING_F_EVENT_IDX))) { -vring-vr.used-flags |= VRING_USED_F_NO_NOTIFY; +vring_set_used_flags(vdev, vring, VRING_USED_F_NO_NOTIFY); } } @@ -117,10 +119,10 @@ bool vring_enable_notification(VirtIODevice *vdev, Vring *vring) if (vdev-guest_features[0] (1 VIRTIO_RING_F_EVENT_IDX)) { vring_avail_event(vring-vr) = vring-vr.avail-idx; } else { -vring-vr.used-flags = ~VRING_USED_F_NO_NOTIFY; +vring_clear_used_flags(vdev, vring, VRING_USED_F_NO_NOTIFY); } smp_mb(); /* ensure update is seen before reading avail_idx */ -return !vring_more_avail(vring); +return !vring_more_avail(vdev, vring); } /* This is stolen from linux/drivers/vhost/vhost.c:vhost_notify() */ @@
Re: [Qemu-devel] [PATCH RFC v3 05/12] virtio: introduce legacy virtio devices
On Wed, 26 Nov 2014 18:28:36 +0100 Cornelia Huck cornelia.h...@de.ibm.com wrote: Introduce a helper function to indicate whether a virtio device is operating in legacy or virtio standard mode. It may be used to make decisions about the endianess of virtio accesses and other virtio-1 specific changes, enabling us to support transitional devices. Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/virtio/virtio.c|6 +- include/hw/virtio/virtio-access.h |4 include/hw/virtio/virtio.h| 13 +++-- 3 files changed, 20 insertions(+), 3 deletions(-) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 2eb5d3c..4149f45 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -883,7 +883,11 @@ static bool virtio_device_endian_needed(void *opaque) VirtIODevice *vdev = opaque; assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN); -return vdev-device_endian != virtio_default_endian(); +if (virtio_device_is_legacy(vdev)) { +return vdev-device_endian != virtio_default_endian(); +} +/* Devices conforming to VIRTIO 1.0 or later are always LE. */ +return vdev-device_endian != VIRTIO_DEVICE_ENDIAN_LITTLE; } Sorry but I still don't understand why we need to stream the device_endian subsection if we have a virtio-1 device... this field is only used on legacy device paths. Can you share an example where it is needed ? static const VMStateDescription vmstate_virtio_device_endian = { diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h index 46456fd..c123ee0 100644 --- a/include/hw/virtio/virtio-access.h +++ b/include/hw/virtio/virtio-access.h @@ -19,6 +19,10 @@ static inline bool virtio_access_is_big_endian(VirtIODevice *vdev) { +if (!virtio_device_is_legacy(vdev)) { +/* Devices conforming to VIRTIO 1.0 or later are always LE. */ +return false; +} #if defined(TARGET_IS_BIENDIAN) return virtio_is_big_endian(vdev); #elif defined(TARGET_WORDS_BIGENDIAN) diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index b408166..40e567c 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -275,9 +275,18 @@ void virtio_queue_set_host_notifier_fd_handler(VirtQueue *vq, bool assign, void virtio_queue_notify_vq(VirtQueue *vq); void virtio_irq(VirtQueue *vq); +static inline bool virtio_device_is_legacy(VirtIODevice *vdev) +{ +return !(vdev-guest_features[1] (1 (VIRTIO_F_VERSION_1 - 32))); +} + static inline bool virtio_is_big_endian(VirtIODevice *vdev) { -assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN); -return vdev-device_endian == VIRTIO_DEVICE_ENDIAN_BIG; +if (virtio_device_is_legacy(vdev)) { +assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN); +return vdev-device_endian == VIRTIO_DEVICE_ENDIAN_BIG; +} +/* Devices conforming to VIRTIO 1.0 or later are always LE. */ +return false; } #endif -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC v3 11/12] virtio-net/virtio-blk: enable virtio 1.0
On Wed, Nov 26, 2014 at 06:28:42PM +0100, Cornelia Huck wrote: virtio-net (non-vhost) and virtio-blk have everything in place to support virtio 1.0: let's enable the feature bit for them. Hmm I doubt that. At least not without more patches. For block, scsi, wce and config-wce must be off, and it must support ANY_LAYOUT (which might be a bit more code). For net, header size is different when mergeable bufs are off, and mac is read-only. Note that VIRTIO_F_VERSION_1 is technically a transport feature; once every device is ready for virtio 1.0, we can move this setting this feature bit out of the individual devices. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/block/virtio-blk.c |4 hw/net/virtio-net.c |4 2 files changed, 8 insertions(+) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 6d86f60..3781f98 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -569,6 +569,10 @@ static uint32_t virtio_blk_get_features(VirtIODevice *vdev, unsigned int index, { VirtIOBlock *s = VIRTIO_BLK(vdev); +if (index == 1) { +features |= (1 (VIRTIO_F_VERSION_1 - 32)); +} + if (index 0) { return features; } diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 1e214b5..fcfc95f 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -447,6 +447,10 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, unsigned int index, VirtIONet *n = VIRTIO_NET(vdev); NetClientState *nc = qemu_get_queue(n-nic); +if (index == 1 !get_vhost_net(nc-peer)) { +features |= (1 (VIRTIO_F_VERSION_1 - 32)); +} + if (index 0) { return features; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH RFC v3 05/12] virtio: introduce legacy virtio devices
On Wed, Nov 26, 2014 at 07:46:38PM +0100, Greg Kurz wrote: On Wed, 26 Nov 2014 18:28:36 +0100 Cornelia Huck cornelia.h...@de.ibm.com wrote: Introduce a helper function to indicate whether a virtio device is operating in legacy or virtio standard mode. It may be used to make decisions about the endianess of virtio accesses and other virtio-1 specific changes, enabling us to support transitional devices. Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/virtio/virtio.c|6 +- include/hw/virtio/virtio-access.h |4 include/hw/virtio/virtio.h| 13 +++-- 3 files changed, 20 insertions(+), 3 deletions(-) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 2eb5d3c..4149f45 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -883,7 +883,11 @@ static bool virtio_device_endian_needed(void *opaque) VirtIODevice *vdev = opaque; assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN); -return vdev-device_endian != virtio_default_endian(); +if (virtio_device_is_legacy(vdev)) { +return vdev-device_endian != virtio_default_endian(); +} +/* Devices conforming to VIRTIO 1.0 or later are always LE. */ +return vdev-device_endian != VIRTIO_DEVICE_ENDIAN_LITTLE; } Sorry but I still don't understand why we need to stream the device_endian subsection if we have a virtio-1 device... this field is only used on legacy device paths. Can you share an example where it is needed ? I think it's needed. A transitional device can be used with legacy native endian and modern little endian format. static const VMStateDescription vmstate_virtio_device_endian = { diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h index 46456fd..c123ee0 100644 --- a/include/hw/virtio/virtio-access.h +++ b/include/hw/virtio/virtio-access.h @@ -19,6 +19,10 @@ static inline bool virtio_access_is_big_endian(VirtIODevice *vdev) { +if (!virtio_device_is_legacy(vdev)) { +/* Devices conforming to VIRTIO 1.0 or later are always LE. */ +return false; +} #if defined(TARGET_IS_BIENDIAN) return virtio_is_big_endian(vdev); #elif defined(TARGET_WORDS_BIGENDIAN) diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index b408166..40e567c 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -275,9 +275,18 @@ void virtio_queue_set_host_notifier_fd_handler(VirtQueue *vq, bool assign, void virtio_queue_notify_vq(VirtQueue *vq); void virtio_irq(VirtQueue *vq); +static inline bool virtio_device_is_legacy(VirtIODevice *vdev) +{ +return !(vdev-guest_features[1] (1 (VIRTIO_F_VERSION_1 - 32))); +} + static inline bool virtio_is_big_endian(VirtIODevice *vdev) { -assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN); -return vdev-device_endian == VIRTIO_DEVICE_ENDIAN_BIG; +if (virtio_device_is_legacy(vdev)) { +assert(vdev-device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN); +return vdev-device_endian == VIRTIO_DEVICE_ENDIAN_BIG; +} +/* Devices conforming to VIRTIO 1.0 or later are always LE. */ +return false; } #endif -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] KVM: arm64: guest debug, add support for single-step
On 25 November 2014 at 16:10, Alex Bennée alex.ben...@linaro.org wrote: This adds support for single-stepping the guest. As userspace can and will manipulate guest registers before restarting any tweaking of the registers has to occur just before control is passed back to the guest. Furthermore while guest debugging is in effect we need to squash the ability of the guest to single-step itself as we have no easy way of re-entering the guest after the exception has been delivered to the hypervisor. A corner case I don't think this patch handles: if the debugger tries to single step an insn which is emulated by the hypervisor (because it's a load/store which is trapped and handled as emulated mmio in userspace) then we won't correctly update the single-step state machine (and so we'll end up incorrectly stopping after the following insn rather than before, I think). You should be able to achieve this effect by simply always clearing the guest's PSTATE.SS when you advance the PC to skip the emulated instruction (cf the comment in the pseudocode SSAdvance() function). I think we should also be doing this PC advance on return from userspace's handling of the mmio rather than before we drop back to userspace as we do now, but I can't remember why I think that. Christoffer, I don't suppose you recall, do you? I think it was you I had this conversation with on IRC a month or so back... -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests v2] x86: Test illegal movbe
Previously KVM ignored the mod field of MOVBE instruction, so MOVBE from register to register succeeds, although it should fail (cause a #UD exception). This test check that a #UD is indeed delivered upon such MOVBE. The test would not work if MOVBE is unsupported. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- v1-v2: Add missing return upon no support for MOVBE --- x86/emulator.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 5aa4dbf..567e421 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -1051,6 +1051,29 @@ static void test_simplealu(u32 *mem) report(test, *mem == 0x8400); } +static void illegal_movbe_handler(struct ex_regs *regs) +{ + extern char bad_movbe_cont; + + ++exceptions; + regs-rip = (ulong)bad_movbe_cont; +} + +static void test_illegal_movbe(void) +{ + if (!(cpuid(1).c (1 22))) { + printf(SKIP: illegal movbe\n); + return; + } + + exceptions = 0; + handle_exception(UD_VECTOR, illegal_movbe_handler); + asm volatile(.byte 0x0f; .byte 0x38; .byte 0xf0; .byte 0xc0;\n\t + bad_movbe_cont: : : : rax); + report(illegal movbe, exceptions == 1); + handle_exception(UD_VECTOR, 0); +} + int main() { void *mem; @@ -1119,6 +1142,7 @@ int main() test_string_io_mmio(mem); test_jmp_noncanonical(mem); + test_illegal_movbe(); return report_summary(); } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT
From: Luis R. Rodriguez mcg...@suse.com Some folks had reported that some xen hypercalls take a long time to complete when issued from the userspace private ioctl mechanism, this can happen for instance with some hypercalls that have many sub-operations, this can happen for instance on hypercalls that use multi-call feature whereby Xen lets one hypercall batch out a series of other hypercalls on the hypervisor. At times such hypercalls can even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default 120 seconds), this a non-issue issue on preemptible kernels though as the kernel may deschedule such long running tasks. Xen for instance supports multicalls to be preempted as well, this is what Xen calls continuation (see xen commit 42217cbc5b which introduced this [0]). On systems without CONFIG_PREEMPT though -- a kernel with voluntary or no preemption -- a long running hypercall will not be descheduled until the hypercall is complete and the ioctl returns to user space. To help with this David had originally implemented support for use of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This solution never went upstream though and upon review to help refactor this I've concluded that usage of preempt_schedule_irq() would be a bit abussive of existing APIs -- for a few reasons: 0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels 1) we want try to consider solutions that might work for other hypervisors for this same problem, and identify it its an issue even present on other hypervisors or if this is a self inflicted architectural issue caused by use of multicalls 2) there is no documentation or profiling of the exact hypercalls that were causing these issues, nor do we have any context to help evaluate this any further I at least checked with kvm folks and it seems hypercall preemption is not needed there. We can survey other hypervisors... If 'something like preemption' is needed then CONFIG_PREEMPT should just be enabled and encouraged, it seems we want to encourage CONFIG_PREEMPT on xen, specially when multicalls are used. In the meantime this tries to address a solution to help xen on non CONFIG_PREEMPT kernels. One option tested and evaluated was to put private hypercalls in process context, however this would introduce complexities such originating hypercalls from different contexts. Current xen hypercall callback handlers would need to be changed per architecture, for instance, we'd also incur the cost of switching states from user / kernel (this cost is also present if preempt_schedule_irq() is used). There may be other issues which could be introduced with this strategy as well. The simplest *shared* alternative is instead to just explicitly schedule() at the end of a private hypercall on non preempt kernels. This forces our private hypercall call mechanism to try to be fair only on non CONFIG_PREEMPT kernels at the cost of more context switch but keeps the private hypercall context intact. [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9 [1] http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch Cc: Davidlohr Bueso dbu...@suse.de Cc: Joerg Roedel jroe...@suse.de Cc: Borislav Petkov b...@suse.de Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Jan Beulich jbeul...@suse.com Cc: Juergen Gross jgr...@suse.com Cc: Olaf Hering oher...@suse.de Cc: David Vrabel david.vra...@citrix.com Signed-off-by: Luis R. Rodriguez mcg...@suse.com --- drivers/xen/privcmd.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index 569a13b..e29edba 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata) hypercall.arg[0], hypercall.arg[1], hypercall.arg[2], hypercall.arg[3], hypercall.arg[4]); +#ifndef CONFIG_PREEMPT + schedule(); +#endif return ret; } -- 2.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: PPC: BOOK3S: HV: Rename variable for better readability
On Mon, Oct 20, 2014 at 07:59:00PM +0530, Aneesh Kumar K.V wrote: Minor cleanup Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 + 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 78e689b066f1..2922f8d127ff 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -523,7 +523,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) unsigned long *args = vcpu-arch.gpr[4]; __be64 *hp, *hptes[4]; unsigned long tlbrb[4]; - long int i, j, k, n, found, indexes[4]; + long int i, j, k, collected_hpte, found, indexes[4]; Hmmm... I don't find it more readable, because collected_hpte sounds like it contains a HPTE value. Also I don't like using a long name for something that is just a temporary value inside a function, and n is a suitable name for a temporary variable counting the number of things we have. I would prefer just adding a comment like this: - n = 0; + n = 0; /* # values collected in tlbrb[], indexes[] etc. */ Paul. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[question] what is virtio-1 device?
Hi, what is virtio-1 device? Thanks, Zhang Haoyu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [question] what is virtio-1 device?
On 2014/11/27 9:26, Zhang Haoyu wrote: Hi, what is virtio-1 device? Are you mean subsystem device id is '1'? That should be a network card. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [question] what is virtio-1 device?
Hi, what is virtio-1 device? Are you mean subsystem device id is '1'? That should be a network card. In the mail qemu: towards virtio-1 host support, virtio: allow virtio-1 queue layout, dataplane: allow virtio-1 devices, etc., virtio-1 devices was mentioned, I don't know which devices it incorporates? Thanks, Zhang Haoyu Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [question] what is virtio-1 device?
On 2014/11/27 9:46, Zhang Haoyu wrote: Hi, what is virtio-1 device? Are you mean subsystem device id is '1'? That should be a network card. In the mail qemu: towards virtio-1 host support, virtio: allow virtio-1 queue layout, dataplane: allow virtio-1 devices, etc., virtio-1 devices was mentioned, I don't know which devices it incorporates? That should mean they're carrying forward virtio based on latest Virtual I/O Device (VIRTIO) Version 1.0. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [question] what is virtio-1 device?
Hi, what is virtio-1 device? Are you mean subsystem device id is '1'? That should be a network card. In the mail qemu: towards virtio-1 host support, virtio: allow virtio-1 queue layout, dataplane: allow virtio-1 devices, etc., virtio-1 devices was mentioned, I don't know which devices it incorporates? That should mean they're carrying forward virtio based on latest Virtual I/O Device (VIRTIO) Version 1.0. Got it, thanks. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT
On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote: From: Luis R. Rodriguez mcg...@suse.com Some folks had reported that some xen hypercalls take a long time to complete when issued from the userspace private ioctl mechanism, this can happen for instance with some hypercalls that have many sub-operations, this can happen for instance on hypercalls that use multi-call feature whereby Xen lets one hypercall batch out a series of other hypercalls on the hypervisor. At times such hypercalls can even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default 120 seconds), this a non-issue issue on preemptible kernels though as the kernel may deschedule such long running tasks. Xen for instance supports multicalls to be preempted as well, this is what Xen calls continuation (see xen commit 42217cbc5b which introduced this [0]). On systems without CONFIG_PREEMPT though -- a kernel with voluntary or no preemption -- a long running hypercall will not be descheduled until the hypercall is complete and the ioctl returns to user space. To help with this David had originally implemented support for use of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This solution never went upstream though and upon review to help refactor this I've concluded that usage of preempt_schedule_irq() would be a bit abussive of existing APIs -- for a few reasons: 0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels 1) we want try to consider solutions that might work for other hypervisors for this same problem, and identify it its an issue even present on other hypervisors or if this is a self inflicted architectural issue caused by use of multicalls 2) there is no documentation or profiling of the exact hypercalls that were causing these issues, nor do we have any context to help evaluate this any further I at least checked with kvm folks and it seems hypercall preemption is not needed there. We can survey other hypervisors... If 'something like preemption' is needed then CONFIG_PREEMPT should just be enabled and encouraged, it seems we want to encourage CONFIG_PREEMPT on xen, specially when multicalls are used. In the meantime this tries to address a solution to help xen on non CONFIG_PREEMPT kernels. One option tested and evaluated was to put private hypercalls in process context, however this would introduce complexities such originating hypercalls from different contexts. Current xen hypercall callback handlers would need to be changed per architecture, for instance, we'd also incur the cost of switching states from user / kernel (this cost is also present if preempt_schedule_irq() is used). There may be other issues which could be introduced with this strategy as well. The simplest *shared* alternative is instead to just explicitly schedule() at the end of a private hypercall on non preempt kernels. This forces our private hypercall call mechanism to try to be fair only on non CONFIG_PREEMPT kernels at the cost of more context switch but keeps the private hypercall context intact. [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9 [1] http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch Cc: Davidlohr Bueso dbu...@suse.de Cc: Joerg Roedel jroe...@suse.de Cc: Borislav Petkov b...@suse.de Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Jan Beulich jbeul...@suse.com Cc: Juergen Gross jgr...@suse.com Cc: Olaf Hering oher...@suse.de Cc: David Vrabel david.vra...@citrix.com Signed-off-by: Luis R. Rodriguez mcg...@suse.com --- drivers/xen/privcmd.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index 569a13b..e29edba 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata) hypercall.arg[0], hypercall.arg[1], hypercall.arg[2], hypercall.arg[3], hypercall.arg[4]); +#ifndef CONFIG_PREEMPT + schedule(); +#endif return ret; } Sorry, I don't think this will solve anything. You're calling schedule() right after the long running hypercall just nanoseconds before returning to the user. I suppose you were mislead by the int 0x82 in [0]. This is the hypercall from the kernel into the hypervisor, e.g. inside of privcmd_call(). Juergen -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] vhost: remove unnecessary forward declarations in vhost.h
Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/vhost/vhost.h | 4 1 file changed, 4 deletions(-) diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 3eda654..7d039ef 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -12,8 +12,6 @@ #include linux/virtio_ring.h #include linux/atomic.h -struct vhost_device; - struct vhost_work; typedef void (*vhost_work_fn_t)(struct vhost_work *work); @@ -54,8 +52,6 @@ struct vhost_log { u64 len; }; -struct vhost_virtqueue; - /* The virtqueue structure describes a queue attached to a device. */ struct vhost_virtqueue { struct vhost_dev *dev; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [question] lots of interrupts injected to vm when pressing some key w/o releasing
I tested win-server-2008 with -cpu core2duo,hv_spinlocks=0x,hv_relaxed,hv_time, this problem still happened, about 200,000 vmexits per-second, bringing very bad experience, just like being stuck. Please upload a full trace somewhere, or at least the perf report output. And, if I remove the commit of 0bc830b0, the problem disappeared. Please send the full trace file. If you compress it, it should be small. See the attach 1, please. Paolo Can you try the follow draft patch to see whether it solve your problem? This patch is based on commit 0bc830b0. After applying this patch, VM got stuck with black-screen at boot stage, # trace-cmd report: version = 6 CPU 0 is empty CPU 1 is empty CPU 2 is empty CPU 3 is empty CPU 5 is empty CPU 7 is empty cpus=8 kvm-1266 [004] 14399.834397: kvm_set_irq: gsi 9 level 1 source 0 kvm-1266 [004] 14399.834403: kvm_pic_set_irq: chip 1 pin 1 (edge|masked) kvm-1266 [004] 14399.834411: kvm_apic_accept_irq: apicid 0 vec 177 (LowPrio|level) kvm-1266 [004] 14399.834412: kvm_ioapic_set_irq: pin 9 dst 3 vec=177 (LowPrio|logical|level) kvm-1266 [004] 14402.180013: kvm_set_irq: gsi 9 level 1 source 0 kvm-1266 [004] 14402.180019: kvm_pic_set_irq: chip 1 pin 1 (edge|masked) kvm-1266 [004] 14402.180028: kvm_apic_accept_irq: apicid 1 vec 177 (LowPrio|level) kvm-1266 [004] 14402.180029: kvm_ioapic_set_irq: pin 9 dst 3 vec=177 (LowPrio|logical|level) kvm-1266 [004] 14404.525627: kvm_set_irq: gsi 9 level 1 source 0 kvm-1266 [004] 14404.525634: kvm_pic_set_irq: chip 1 pin 1 (edge|masked) kvm-1266 [004] 14404.525641: kvm_apic_accept_irq: apicid 0 vec 177 (LowPrio|level) kvm-1266 [004] 14404.525642: kvm_ioapic_set_irq: pin 9 dst 3 vec=177 (LowPrio|logical|level) kvm-1266 [004] 14406.871238: kvm_set_irq: gsi 9 level 1 source 0 kvm-1266 [004] 14406.871245: kvm_pic_set_irq: chip 1 pin 1 (edge|masked) kvm-1266 [004] 14406.871254: kvm_apic_accept_irq: apicid 1 vec 177 (LowPrio|level) kvm-1266 [004] 14406.871256: kvm_ioapic_set_irq: pin 9 dst 3 vec=177 (LowPrio|logical|level) kvm-1266 [006] 14409.216849: kvm_set_irq: gsi 9 level 1 source 0 kvm-1266 [006] 14409.216855: kvm_pic_set_irq: chip 1 pin 1 (edge|masked) kvm-1266 [006] 14409.216862: kvm_apic_accept_irq: apicid 0 vec 177 (LowPrio|level) kvm-1266 [006] 14409.216863: kvm_ioapic_set_irq: pin 9 dst 3 vec=177 (LowPrio|logical|level) kvm-1266 [006] 14411.562475: kvm_set_irq: gsi 9 level 1 source 0 kvm-1266 [006] 14411.562481: kvm_pic_set_irq: chip 1 pin 1 (edge|masked) kvm-1266 [006] 14411.562489: kvm_apic_accept_irq: apicid 1 vec 177 (LowPrio|level) kvm-1266 [006] 14411.562491: kvm_ioapic_set_irq: pin 9 dst 3 vec=177 (LowPrio|logical|level) kvm-1266 [004] 14413.908074: kvm_set_irq: gsi 9 level 1 source 0 kvm-1266 [004] 14413.908080: kvm_pic_set_irq: chip 1 pin 1 (edge|masked) kvm-1266 [004] 14413.908088: kvm_apic_accept_irq: apicid 0 vec 177 (LowPrio|level) kvm-1266 [004] 14413.908089: kvm_ioapic_set_irq: pin 9 dst 3 vec=177 (LowPrio|logical|level) Thanks, Zhang Haoyu diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 25e16a6..8f4e211 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -305,6 +305,7 @@ static int ioapic_service(struct kvm_ioapic *ioapic, int irq, bool line_status) return ret; } +static int irq_status[256]; int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int irq_source_id, int level, bool line_status) { @@ -312,10 +313,13 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int irq_source_id, u32 mask = 1 irq; union kvm_ioapic_redirect_entry entry; int ret, irq_level; + int old_irq; BUG_ON(irq 0 || irq = IOAPIC_NUM_PINS); spin_lock(ioapic-lock); + old_irq = irq_status[irq]; + irq_status[irq] = level; old_irr = ioapic-irr; irq_level = __kvm_irq_line_state(ioapic-irq_states[irq], irq_source_id, level); @@ -332,6 +336,10 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int irq_source_id, goto out; } ioapic-irr |= mask; + if (edge old_irq == irq_status[irq] old_irq) { + ret = 0; + goto out; + } if ((edge old_irr != ioapic-irr) || (!edge !entry.fields.remote_irr)) ret =
Re: [PATCH 3/3] KVM: PPC: BOOK3S: HV: Rename variable for better readability
On Mon, Oct 20, 2014 at 07:59:00PM +0530, Aneesh Kumar K.V wrote: Minor cleanup Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 + 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 78e689b066f1..2922f8d127ff 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -523,7 +523,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) unsigned long *args = vcpu-arch.gpr[4]; __be64 *hp, *hptes[4]; unsigned long tlbrb[4]; - long int i, j, k, n, found, indexes[4]; + long int i, j, k, collected_hpte, found, indexes[4]; Hmmm... I don't find it more readable, because collected_hpte sounds like it contains a HPTE value. Also I don't like using a long name for something that is just a temporary value inside a function, and n is a suitable name for a temporary variable counting the number of things we have. I would prefer just adding a comment like this: - n = 0; + n = 0; /* # values collected in tlbrb[], indexes[] etc. */ Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html