[GIT PULL 0/2] KVM: s390: Fix and cleanup for 4.2 (kvm/next)
Paolo, one fix and one cleanup for 4.2. The following changes since commit 06b36753a6466239fc945b6756e12d635621ae5f: KVM: s390: drop handling of interception code 12 (2015-05-08 15:51:17 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git tags/kvm-s390-next-20150602 for you to fetch changes up to ea2cdd27dce66dc498c623adacd807ea3a350443: KVM: s390: introduce KMSG_COMPONENT for kvm-s390 (2015-06-02 09:39:20 +0200) KVM: s390: Fix and cleanup for 4.2 (kvm/next) One small fix for a commit targetted for 4.2 and one cleanup regarding our printks. David Hildenbrand (2): KVM: s390: call exit_sie() directly on vcpu block/request KVM: s390: introduce KMSG_COMPONENT for kvm-s390 arch/s390/kvm/kvm-s390.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL 1/2] KVM: s390: call exit_sie() directly on vcpu block/request
From: David Hildenbrand d...@linux.vnet.ibm.com Thinking about it, I can't find a real use case where we want to block a VCPU and not kick it out of SIE. (except if we want to do the same in batch for multiple VCPUs - but that's a micro optimization) So let's simply perform the exit_sie() calls directly when setting the other magic block bits in the SIE. Otherwise e.g. kvm_s390_set_tod_low() still has other VCPUs running after that call, working with a wrong epoch. Fixes: 27406cd50c (KVM: s390: provide functions for blocking all CPUs) Acked-by: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com --- arch/s390/kvm/kvm-s390.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 6bc69ab..f37557b 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -1417,6 +1417,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) void kvm_s390_vcpu_block(struct kvm_vcpu *vcpu) { atomic_set_mask(PROG_BLOCK_SIE, vcpu-arch.sie_block-prog20); + exit_sie(vcpu); } void kvm_s390_vcpu_unblock(struct kvm_vcpu *vcpu) @@ -1427,6 +1428,7 @@ void kvm_s390_vcpu_unblock(struct kvm_vcpu *vcpu) static void kvm_s390_vcpu_request(struct kvm_vcpu *vcpu) { atomic_set_mask(PROG_REQUEST, vcpu-arch.sie_block-prog20); + exit_sie(vcpu); } static void kvm_s390_vcpu_request_handled(struct kvm_vcpu *vcpu) @@ -1450,7 +1452,6 @@ void kvm_s390_sync_request(int req, struct kvm_vcpu *vcpu) { kvm_make_request(req, vcpu); kvm_s390_vcpu_request(vcpu); - exit_sie(vcpu); } static void kvm_gmap_notifier(struct gmap *gmap, unsigned long address) -- 2.3.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL 2/2] KVM: s390: introduce KMSG_COMPONENT for kvm-s390
From: David Hildenbrand d...@linux.vnet.ibm.com Let's remove kvm-s390 from our printk messages and make use of pr_fmt instead. Also replace one printk() occurrence by a equivalent pr_warn on the way. Suggested-by: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com --- arch/s390/kvm/kvm-s390.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index f37557b..9cb6cfaa 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -36,6 +36,10 @@ #include kvm-s390.h #include gaccess.h +#define KMSG_COMPONENT kvm-s390 +#undef pr_fmt +#define pr_fmt(fmt) KMSG_COMPONENT : fmt + #define CREATE_TRACE_POINTS #include trace.h #include trace-s390.h @@ -2086,7 +2090,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) if (!kvm_s390_user_cpu_state_ctrl(vcpu-kvm)) { kvm_s390_vcpu_start(vcpu); } else if (is_vcpu_stopped(vcpu)) { - pr_err_ratelimited(kvm-s390: can't run stopped vcpu %d\n, + pr_err_ratelimited(can't run stopped vcpu %d\n, vcpu-vcpu_id); return -EINVAL; } @@ -2617,7 +2621,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, rc = gmap_map_segment(kvm-arch.gmap, mem-userspace_addr, mem-guest_phys_addr, mem-memory_size); if (rc) - printk(KERN_WARNING kvm-s390: failed to commit memory region\n); + pr_warn(failed to commit memory region\n); return; } -- 2.3.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
On Mon, Jun 01, 2015 at 08:48:22AM -0700, Mario Smarduch wrote: On 05/30/2015 11:59 PM, Christoffer Dall wrote: Hi Mario, On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote: On 05/28/2015 11:49 AM, Christoffer Dall wrote: Until now we have been calling kvm_guest_exit after re-enabling interrupts when we come back from the guest, but this has the unfortunate effect that CPU time accounting done in the context of timer interrupts occurring while the guest is running doesn't properly notice that the time since the last tick was spent in the guest. Inspired by the comment in the x86 code, move the kvm_guest_exit() call below the local_irq_enable() call and change __kvm_guest_exit() to kvm_guest_exit(), because we are now calling this function with interrupts enabled. We have to now explicitly disable preemption and not enable preemption before we've called kvm_guest_exit(), since otherwise we could be preempted and everything happening before we eventually get scheduled again would be accounted for as guest time. At the same time, move the trace_kvm_exit() call outside of the atomic section, since there is no reason for us to do that with interrupts disabled. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- This patch is based on kvm/queue, because it has the kvm_guest_enter/exit rework recently posted by Christian Borntraeger. I hope I got the logic of this right, there were 2 slightly worrying facts about this: First, we now enable and disable and enable interrupts on each exit path, but I couldn't see any performance overhead on hackbench - yes the only benchmark we care about. Second, looking at the ppc and mips code, they seem to also call kvm_guest_exit() before enabling interrupts, so I don't understand how guest CPU time accounting works on those architectures. Changes since v1: - Tweak comment and commit text based on Marc's feedback. - Explicitly disable preemption and enable it only after kvm_guest_exit(). arch/arm/kvm/arm.c | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index e41cb11..fe8028d 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) kvm_vgic_flush_hwstate(vcpu); kvm_timer_flush_hwstate(vcpu); + preempt_disable(); local_irq_disable(); /* @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) { local_irq_enable(); + preempt_enable(); kvm_timer_sync_hwstate(vcpu); kvm_vgic_sync_hwstate(vcpu); continue; @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) ret = kvm_call_hyp(__kvm_vcpu_run, vcpu); vcpu-mode = OUTSIDE_GUEST_MODE; - __kvm_guest_exit(); - trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu)); + /* + * Back from guest + */ + /* * We may have taken a host interrupt in HYP mode (ie * while executing the guest). This interrupt is still @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) local_irq_enable(); /* - * Back from guest - */ + * We do local_irq_enable() before calling kvm_guest_exit() so + * that if a timer interrupt hits while running the guest we + * account that tick as being spent in the guest. We enable + * preemption after calling kvm_guest_exit() so that if we get + * preempted we make sure ticks after that is not counted as + * guest time. + */ + kvm_guest_exit(); + trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu)); + preempt_enable(); + kvm_timer_sync_hwstate(vcpu); kvm_vgic_sync_hwstate(vcpu); Hi Christoffer, so currently we take a snap shot when we enter the guest (tsk-vtime_snap) and upon exit add the time we spent in the guest and update accrued time, which appears correct. not on arm64, because we don't select HAVE_VIRT_CPU_ACCOUNTING_GEN. Or am I missing something obvious here? I see what you mean we can't use cycle based accounting to accrue Guest time. See other thread, we can enable this in the config but it still only works with NO_HZ_FULL. With this patch it appears that interrupts running in host mode are accrued to
Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
On Mon, Jun 01, 2015 at 03:37:32PM +0200, Christian Borntraeger wrote: Am 01.06.2015 um 15:35 schrieb Christoffer Dall: On Mon, Jun 01, 2015 at 11:21:19AM +0200, Christian Borntraeger wrote: Am 01.06.2015 um 11:08 schrieb Christoffer Dall: Second, looking at the ppc and mips code, they seem to also call kvm_guest_exit() before enabling interrupts, so I don't understand how guest CPU time accounting works on those architectures. Not an expert here, but I assume mips has the same logic as arm so if your patch is right for arm its probably also for mips. powerpc looks similar to what s390 does (not using the tick, instead it uses a hw-timer) so this should be fine. I wonder if we can simply enable HAVE_VIRT_CPU_ACCOUNTING_GEN and get this for free which would avoid the need for this patch? Asssuming that HAVE_VIRT_CPU_ACCOUNTING_GEN behaves similar to HAVE_VIRT_CPU_ACCOUNTING on s390/power in respect to not rely on ticks - yes it might work out. Can you give it a try? Adding HAVE_VIRT_CPU_ACCOUNTING_GEN to arch/arm64/Kconfig works, but has no effect unless you also enable CONFIG_NO_HZ_FULL, so that hardly feels like a fix since it would be a shame to force users to use this config option to report CPU usage correctly. I'm not entirely sure what the history and meaning behind these configs are, so maybe there is an entirely different rework needed here. It seems logical that you could simply sample the counter at entry/exit of the guest, but if there is nowhere to store this data without NO_HZ_FULL+VIRT_CPU_ACCOUNTING_GEN then I guess that would be why? Given Paolos response that irq_disable/enable is faster than save/restore at least on x86 your v2 patch might actually be the right thing to do. Thanks, I think so too, but we should enable HAVE_VIRT_CPU_ACCOUNTING_GEN for arm64 as well, assuming there are no mysterious side affects of doing so. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Update on Split Irqchip Patches
Hi All, I just sent out a new version of the patches that enable a split irqchip. I've tested them against Google's VMM, and the updated patches boot Windows and pass the KVM unit tests. It's mostly the same as before, with the tweaks suggested against the first version I sent out. A new Google Intern (Andrew Liu) is currently looking into patching QEMU to work with this patch set, which is pretty exciting. Are there any changes/updates to this patch set that are necessary before it could be merged? Cheers, Steve -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/4] KVM: x86: Split the APIC from the rest of IRQCHIP.
First patch in a series which enables the relocation of the PIC/IOAPIC to userspace. Adds capability KVM_CAP_SPLIT_IRQCHIP; KVM_CAP_SPLIT_IRQCHIP enables the construction of LAPICs without the rest of the irqchip. Compile tested for x86. Signed-off-by: Steve Rutherford srutherf...@google.com Suggested-by: Andrew Honig aho...@google.com --- Documentation/virtual/kvm/api.txt | 15 arch/powerpc/kvm/irq.h| 5 arch/s390/kvm/irq.h | 4 arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/assigned-dev.c | 4 ++-- arch/x86/kvm/irq.c| 6 ++--- arch/x86/kvm/irq.h| 11 + arch/x86/kvm/irq_comm.c | 7 ++ arch/x86/kvm/lapic.c | 13 +++ arch/x86/kvm/mmu.c| 2 +- arch/x86/kvm/svm.c| 4 ++-- arch/x86/kvm/vmx.c| 12 +- arch/x86/kvm/x86.c| 49 +++ include/kvm/arm_vgic.h| 1 + include/linux/kvm_host.h | 1 + include/uapi/linux/kvm.h | 1 + virt/kvm/irqchip.c| 2 +- 17 files changed, 104 insertions(+), 35 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 6955444..9a43d42 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2979,6 +2979,7 @@ len must be a multiple of sizeof(struct kvm_s390_irq). It must be 0 and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq), which is the maximum number of possibly pending cpu-local interrupts. + 5. The kvm_run structure @@ -3575,6 +3576,20 @@ struct { KVM handlers should exit to userspace with rc = -EREMOTE. +7.5 KVM_SPLIT_IRQCHIP + +Capability: KVM_CAP_SPLIT_IRQCHIP +Architectures: x86 +Type: VM ioctl +Parameters: None +Returns: 0 on success, -1 on error + +Create a local apic for each processor in the kernel. This differs from +KVM_CREATE_IRQCHIP in that it only creates the local apic; it creates neither +the ioapic nor the pic in the kernel. Also, enables in kernel routing of +interrupt requests. Fails if VCPU has already been created, or if the irqchip is +already in the kernel. + 8. Other capabilities. -- diff --git a/arch/powerpc/kvm/irq.h b/arch/powerpc/kvm/irq.h index 5a9a10b..5e6fa06 100644 --- a/arch/powerpc/kvm/irq.h +++ b/arch/powerpc/kvm/irq.h @@ -17,4 +17,9 @@ static inline int irqchip_in_kernel(struct kvm *kvm) return ret; } +static inline int lapic_in_kernel(struct kvm *kvm) +{ + return irqchip_in_kernel(kvm); +} + #endif diff --git a/arch/s390/kvm/irq.h b/arch/s390/kvm/irq.h index d98e415..db876c3 100644 --- a/arch/s390/kvm/irq.h +++ b/arch/s390/kvm/irq.h @@ -19,4 +19,8 @@ static inline int irqchip_in_kernel(struct kvm *kvm) return 1; } +static inline int lapic_in_kernel(struct kvm *kvm) +{ + return irqchip_in_kernel(kvm); +} #endif diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 7276107..af3225a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -639,6 +639,8 @@ struct kvm_arch { bool boot_vcpu_runs_old_kvmclock; u64 disabled_quirks; + + bool irqchip_split; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/assigned-dev.c b/arch/x86/kvm/assigned-dev.c index d090ecf..1237e92 100644 --- a/arch/x86/kvm/assigned-dev.c +++ b/arch/x86/kvm/assigned-dev.c @@ -291,7 +291,7 @@ static int kvm_deassign_irq(struct kvm *kvm, { unsigned long guest_irq_type, host_irq_type; - if (!irqchip_in_kernel(kvm)) + if (!lapic_in_kernel(kvm)) return -EINVAL; /* no irq assignment to deassign */ if (!assigned_dev-irq_requested_type) @@ -568,7 +568,7 @@ static int kvm_vm_ioctl_assign_irq(struct kvm *kvm, struct kvm_assigned_dev_kernel *match; unsigned long host_irq_type, guest_irq_type; - if (!irqchip_in_kernel(kvm)) + if (!lapic_in_kernel(kvm)) return r; mutex_lock(kvm-lock); diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index a1ec6a50..706e47a 100644 --- a/arch/x86/kvm/irq.c +++ b/arch/x86/kvm/irq.c @@ -57,7 +57,7 @@ static int kvm_cpu_has_extint(struct kvm_vcpu *v) */ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v) { - if (!irqchip_in_kernel(v-kvm)) + if (!lapic_in_kernel(v-kvm)) return v-arch.interrupt.pending; if (kvm_cpu_has_extint(v)) @@ -75,7 +75,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v) */ int kvm_cpu_has_interrupt(struct kvm_vcpu *v) { - if (!irqchip_in_kernel(v-kvm)) + if (!lapic_in_kernel(v-kvm)) return v-arch.interrupt.pending; if (kvm_cpu_has_extint(v)) @@ -103,7 +103,7 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v) { int vector; - if
[PATCH v3 3/4] KVM: x86: Add EOI exit bitmap inference
In order to support a userspace IOAPIC interacting with an in kernel APIC, the EOI exit bitmaps need to be configurable. If the IOAPIC is in userspace (i.e. the irqchip has been split), the EOI exit bitmaps will be set whenever the GSI Routes are configured. In particular, for the low MSI routes are reservable for userspace IOAPICs. For these MSI routes, the EOI Exit bit corresponding to the destination vector of the route will be set for the destination VCPU. The intention is for the userspace IOAPICs to use the reservable MSI routes to inject interrupts into the guest. This is a slight abuse of the notion of an MSI Route, given that MSIs classically bypass the IOAPIC. It might be worthwhile to add an additional route type to improve clarity. Compile tested for Intel x86. Signed-off-by: Steve Rutherford srutherf...@google.com --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/ioapic.c | 16 arch/x86/kvm/ioapic.h | 2 ++ arch/x86/kvm/lapic.c| 3 +-- arch/x86/kvm/x86.c | 30 ++ include/linux/kvm_host.h| 9 + virt/kvm/irqchip.c | 37 + 7 files changed, 88 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 2778d36..4f439ff 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -644,6 +644,7 @@ struct kvm_arch { u64 disabled_quirks; bool irqchip_split; + u8 nr_reserved_ioapic_pins; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c index 856f791..fb5281b 100644 --- a/arch/x86/kvm/ioapic.c +++ b/arch/x86/kvm/ioapic.c @@ -672,3 +672,19 @@ int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state) spin_unlock(ioapic-lock); return 0; } + +void kvm_arch_irq_routing_update(struct kvm *kvm) +{ + struct kvm_ioapic *ioapic = kvm-arch.vioapic; + + if (ioapic) + return; + if (!lapic_in_kernel(kvm)) + return; + kvm_make_scan_ioapic_request(kvm); +} + +u8 kvm_arch_nr_userspace_ioapic_pins(struct kvm *kvm) +{ + return kvm-arch.nr_reserved_ioapic_pins; +} diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h index ca0b0b4..3af349c 100644 --- a/arch/x86/kvm/ioapic.h +++ b/arch/x86/kvm/ioapic.h @@ -9,6 +9,7 @@ struct kvm; struct kvm_vcpu; #define IOAPIC_NUM_PINS KVM_IOAPIC_NUM_PINS +#define MAX_NR_RESERVED_IOAPIC_PINS 48 #define IOAPIC_VERSION_ID 0x11 /* IOAPIC version */ #define IOAPIC_EDGE_TRIG 0 #define IOAPIC_LEVEL_TRIG 1 @@ -123,4 +124,5 @@ int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state); void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, u32 *tmr); +void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); #endif diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 28eb946..766d297 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -209,8 +209,7 @@ out: if (old) kfree_rcu(old, rcu); - if (!irqchip_split(kvm)) - kvm_vcpu_request_scan_ioapic(kvm); + kvm_make_scan_ioapic_request(kvm); } static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5e01810..35d13d4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3930,15 +3930,20 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, case KVM_CAP_SPLIT_IRQCHIP: { mutex_lock(kvm-lock); r = -EEXIST; - if (lapic_in_kernel(kvm)) + if (irqchip_in_kernel(kvm)) goto split_irqchip_unlock; r = -EINVAL; - if (atomic_read(kvm-online_vcpus)) - goto split_irqchip_unlock; - r = kvm_setup_empty_irq_routing(kvm); - if (r) + if (cap-args[0] MAX_NR_RESERVED_IOAPIC_PINS) goto split_irqchip_unlock; - kvm-arch.irqchip_split = true; + if (!irqchip_split(kvm)) { + if (atomic_read(kvm-online_vcpus)) + goto split_irqchip_unlock; + r = kvm_setup_empty_irq_routing(kvm); + if (r) + goto split_irqchip_unlock; + kvm-arch.irqchip_split = true; + } + kvm-arch.nr_reserved_ioapic_pins = cap-args[0]; r = 0; split_irqchip_unlock: mutex_unlock(kvm-lock); @@ -6403,8 +6408,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) goto out; } } - if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu)) -
[PATCH v3 2/4] KVM: x86: Add KVM exit for IOAPIC EOIs
Adds KVM_EXIT_IOAPIC_EOI which allows the kernel to EOI level-triggered IOAPIC interrupts. Uses a per VCPU exit bitmap to decide whether or not the IOAPIC needs to be informed (which is identical to the EOI_EXIT_BITMAP field used by modern x86 processors, but can also be used to elide kvm IOAPIC EOI exits on older processors). [Note: A prototype using ResampleFDs found that decoupling the EOI from the VCPU's thread made it possible for the VCPU to not see a recent EOI after reentering the guest. This does not match real hardware.] Compile tested for Intel x86. Signed-off-by: Steve Rutherford srutherf...@google.com --- Documentation/virtual/kvm/api.txt | 10 ++ arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/lapic.c | 9 + arch/x86/kvm/x86.c| 11 +++ include/linux/kvm_host.h | 1 + include/uapi/linux/kvm.h | 5 + 6 files changed, 39 insertions(+) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 9a43d42..6ab2a3f7 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3271,6 +3271,16 @@ Valid values for 'type' are: */ __u64 kvm_valid_regs; __u64 kvm_dirty_regs; + + /* KVM_EXIT_IOAPIC_EOI */ +struct { + __u8 vector; +} eoi; + +Indicates that an eoi of a level triggered IOAPIC interrupt on vector has +occurred, which should be handled by the userspace IOAPIC. Triggers when +the Irqchip has been split between userspace and the kernel. + union { struct kvm_sync_regs regs; char padding[1024]; diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index af3225a..2778d36 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -540,6 +540,9 @@ struct kvm_vcpu_arch { struct { bool pv_unhalted; } pv; + + u64 eoi_exit_bitmaps[4]; + int pending_ioapic_eoi; }; struct kvm_lpage_info { diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 92f4c98..28eb946 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -869,6 +869,15 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2) static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector) { + if (irqchip_split(apic-vcpu-kvm)) { + if (test_bit(vector, +(void *) apic-vcpu-arch.eoi_exit_bitmaps)) { + apic-vcpu-arch.pending_ioapic_eoi = vector; + kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic-vcpu); + } + return; + } + if (kvm_ioapic_handles_vector(apic-vcpu-kvm, vector)) { int trigger_mode; if (apic_test_vector(vector, apic-regs + APIC_TMR)) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 19c8980..5e01810 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6392,6 +6392,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) kvm_handle_pmu_event(vcpu); if (kvm_check_request(KVM_REQ_PMI, vcpu)) kvm_deliver_pmi(vcpu); + if (kvm_check_request(KVM_REQ_IOAPIC_EOI_EXIT, vcpu)) { + BUG_ON(vcpu-arch.pending_ioapic_eoi 255); + if (test_bit(vcpu-arch.pending_ioapic_eoi, +(void *) vcpu-arch.eoi_exit_bitmaps)) { + vcpu-run-exit_reason = KVM_EXIT_IOAPIC_EOI; + vcpu-run-eoi.vector = + vcpu-arch.pending_ioapic_eoi; + r = 0; + goto out; + } + } if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu)) vcpu_scan_ioapic(vcpu); if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu)) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7e2b41a..c6df36f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -134,6 +134,7 @@ static inline bool is_error_page(struct page *page) #define KVM_REQ_ENABLE_IBS23 #define KVM_REQ_DISABLE_IBS 24 #define KVM_REQ_APIC_PAGE_RELOAD 25 +#define KVM_REQ_IOAPIC_EOI_EXIT 26 #define KVM_USERSPACE_IRQ_SOURCE_ID0 #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 1e6f6c3..826a08d 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -183,6 +183,7 @@ struct kvm_s390_skeys { #define KVM_EXIT_EPR 23 #define KVM_EXIT_SYSTEM_EVENT 24 #define KVM_EXIT_S390_STSI25 +#define KVM_EXIT_IOAPIC_EOI 26 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -329,6 +330,10 @@
[PATCH v3 4/4] KVM: x86: Add support for local interrupt requests from userspace
In order to enable userspace PIC support, the userspace PIC needs to be able to inject local interrupt requests. This adds the ioctl KVM_REQUEST_PIC_INJECTION and kvm exit KVM_EXIT_GET_EXTINT. The vm ioctl KVM_REQUEST_PIC_INJECTION makes a KVM_REQ_EVENT request on the BSP, which causes the BSP to exit to userspace to fetch the vector of the underlying external interrupt, which the BSP then injects into the guest. This matches the PIC spec, and is necessary to boot Windows. Compiles for x86. Update: Boots Windows and passes the KVM Unit Tests. Signed-off-by: Steve Rutherford srutherf...@google.com --- Documentation/virtual/kvm/api.txt | 9 ++ arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/irq.c| 22 +-- arch/x86/kvm/lapic.c | 7 + arch/x86/kvm/lapic.h | 2 ++ arch/x86/kvm/x86.c| 59 +-- include/uapi/linux/kvm.h | 7 + 7 files changed, 103 insertions(+), 5 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 6ab2a3f7..b5d90cb 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2979,6 +2979,15 @@ len must be a multiple of sizeof(struct kvm_s390_irq). It must be 0 and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq), which is the maximum number of possibly pending cpu-local interrupts. +4.96 KVM_REQUEST_PIC_INJECTION + +Capability: KVM_CAP_SPLIT_IRQCHIP +Type: VM ioctl +Parameters: none +Returns: 0 on success, -1 on error. + +Informs the kernel that userspace has a pending external interrupt. + 5. The kvm_run structure diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4f439ff..0e8b0fc 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -543,6 +543,8 @@ struct kvm_vcpu_arch { u64 eoi_exit_bitmaps[4]; int pending_ioapic_eoi; + bool userspace_extint_available; + int pending_external_vector; }; struct kvm_lpage_info { diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index 706e47a..1270b2a 100644 --- a/arch/x86/kvm/irq.c +++ b/arch/x86/kvm/irq.c @@ -38,12 +38,25 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) EXPORT_SYMBOL(kvm_cpu_has_pending_timer); /* + * check if there is a pending userspace external interrupt + */ +static int pending_userspace_extint(struct kvm_vcpu *v) +{ + return v-arch.userspace_extint_available || + v-arch.pending_external_vector != -1; +} + +/* * check if there is pending interrupt from * non-APIC source without intack. */ static int kvm_cpu_has_extint(struct kvm_vcpu *v) { - if (kvm_apic_accept_pic_intr(v)) + u8 accept = kvm_apic_accept_pic_intr(v); + + if (accept irqchip_split(v-kvm)) + return pending_userspace_extint(v); + else if (accept) return pic_irqchip(v-kvm)-output; /* PIC */ else return 0; @@ -91,7 +104,12 @@ EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt); */ static int kvm_cpu_get_extint(struct kvm_vcpu *v) { - if (kvm_cpu_has_extint(v)) + if (irqchip_split(v-kvm) kvm_cpu_has_extint(v)) { + int vector = v-arch.pending_external_vector; + + v-arch.pending_external_vector = -1; + return vector; + } else if (kvm_cpu_has_extint(v)) return kvm_pic_read_irq(v-kvm); /* PIC */ return -1; } diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 766d297..012b56ee 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2094,3 +2094,10 @@ void kvm_lapic_init(void) jump_label_rate_limit(apic_hw_disabled, HZ); jump_label_rate_limit(apic_sw_disabled, HZ); } + +void kvm_request_pic_injection(struct kvm_vcpu *vcpu) +{ + vcpu-arch.userspace_extint_available = true; + kvm_make_request(KVM_REQ_EVENT, vcpu); + kvm_vcpu_kick(vcpu); +} diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 71b150c..7831e4d 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -63,6 +63,8 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq, unsigned long *dest_map); int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type); +void kvm_request_pic_injection(struct kvm_vcpu *vcpu); + bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src, struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 35d13d4..40e7509 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -65,6 +65,8 @@ #include asm/pvclock.h #include asm/div64.h +#define GET_VECTOR_FROM_USERSPACE 1 + #define MAX_IO_MSRS 256 #define KVM_MAX_MCE_BANKS 32 #define KVM_MCE_CAP_SUPPORTED (MCG_CTL_P | MCG_SER_P) @@ -4217,6 +4219,30 @@ long
Re: [PATCH 05/15] KVM: MTRR: clean up mtrr default type
Thanks for your review, Paolo! On 06/01/2015 05:11 PM, Paolo Bonzini wrote: struct kvm_vcpu_arch { diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c index 562341b..6de49dd 100644 --- a/arch/x86/kvm/mtrr.c +++ b/arch/x86/kvm/mtrr.c @@ -105,7 +105,6 @@ EXPORT_SYMBOL_GPL(kvm_mtrr_valid); static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr) { struct kvm_mtrr *mtrr_state = vcpu-arch.mtrr_state; - unsigned char mtrr_enabled = mtrr_state-enabled; gfn_t start, end, mask; int index; bool is_fixed = true; @@ -114,7 +113,7 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr) !kvm_arch_has_noncoherent_dma(vcpu-kvm)) return; - if (!(mtrr_enabled 0x2) msr != MSR_MTRRdefType) + if (!mtrr_state-mtrr_enabled msr != MSR_MTRRdefType) I know Linus doesn't like bitfields too much. Can you change these to inline functions, and only leave an u64 deftype in the struct? Sure. I will introduce mtrr_is_enabled() and fixed_mtrr_is_enabled() instead of these bitfields. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/15] KVM: MTRR: improve kvm_mtrr_get_guest_memory_type
On 06/01/2015 05:16 PM, Paolo Bonzini wrote: On 30/05/2015 12:59, Xiao Guangrong wrote: - kvm_mtrr_get_guest_memory_type() only checks one page in MTRRs so that it's unnecessary to check to see if the range is partially covered in MTRR - optimize the check of overlap memory type and add some comments to explain the precedence Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com --- arch/x86/kvm/mtrr.c | 89 ++--- 1 file changed, 44 insertions(+), 45 deletions(-) diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c index bc9c6da..d3c06d2 100644 --- a/arch/x86/kvm/mtrr.c +++ b/arch/x86/kvm/mtrr.c @@ -231,24 +231,16 @@ int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) return 0; } -/* - * The function is based on mtrr_type_lookup() in - * arch/x86/kernel/cpu/mtrr/generic.c - */ -static int get_mtrr_type(struct kvm_mtrr *mtrr_state, -u64 start, u64 end) +u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) { - u64 base, mask; - u8 prev_match, curr_match; - int i, num_var_ranges = KVM_NR_VAR_MTRR; + struct kvm_mtrr *mtrr_state = vcpu-arch.mtrr_state; + u64 base, mask, start = gfn_to_gpa(gfn); + int i, num_var_ranges = KVM_NR_VAR_MTRR, type_mask, type = -1; Do not mix initialized and uninitialized variables on the same line (preexisting, I know, but let's fix it instead of making it worse :)). Please put each initialized variable on a separate line. Okay. Will follow this style in the future development. Also please initialize type_mask here (more on this below). /* MTRR is completely disabled, use UC for all of physical memory. */ if (!mtrr_state-mtrr_enabled) return MTRR_TYPE_UNCACHABLE; - /* Make end inclusive end, instead of exclusive */ - end--; - /* Look in fixed ranges. Just return the type as per start */ if (mtrr_state-fixed_mtrr_enabled (start 0x10)) { int idx; @@ -273,9 +265,9 @@ static int get_mtrr_type(struct kvm_mtrr *mtrr_state, * Look of multiple ranges matching this address and pick type * as per MTRR precedence */ - prev_match = 0xFF; + type_mask = (1 MTRR_TYPE_WRBACK) | (1 MTRR_TYPE_WRTHROUGH); for (i = 0; i num_var_ranges; ++i) { - unsigned short start_state, end_state; + int curr_type; if (!(mtrr_state-var_ranges[i].mask (1 11))) continue; @@ -283,50 +275,57 @@ static int get_mtrr_type(struct kvm_mtrr *mtrr_state, base = mtrr_state-var_ranges[i].base PAGE_MASK; mask = mtrr_state-var_ranges[i].mask PAGE_MASK; - start_state = ((start mask) == (base mask)); - end_state = ((end mask) == (base mask)); - if (start_state != end_state) - return 0xFE; - if ((start mask) != (base mask)) continue; - curr_match = mtrr_state-var_ranges[i].base 0xff; - if (prev_match == 0xFF) { - prev_match = curr_match; + /* +* Please refer to Intel SDM Volume 3: 11.11.4.1 MTRR +* Precedences. +*/ + + curr_type = mtrr_state-var_ranges[i].base 0xff; + if (type == -1) { + type = curr_type; continue; } - if (prev_match == MTRR_TYPE_UNCACHABLE || - curr_match == MTRR_TYPE_UNCACHABLE) + /* +* If two or more variable memory ranges match and the +* memory types are identical, then that memory type is +* used. +*/ + if (type == curr_type) + continue; + + /* +* If two or more variable memory ranges match and one of +* the memory types is UC, the UC memory type used. +*/ + if (curr_type == MTRR_TYPE_UNCACHABLE) return MTRR_TYPE_UNCACHABLE; - if ((prev_match == MTRR_TYPE_WRBACK -curr_match == MTRR_TYPE_WRTHROUGH) || - (prev_match == MTRR_TYPE_WRTHROUGH -curr_match == MTRR_TYPE_WRBACK)) { - prev_match = MTRR_TYPE_WRTHROUGH; - curr_match = MTRR_TYPE_WRTHROUGH; + /* +* If two or more variable memory ranges match and the +* memory types are WT and WB, the WT memory type is used. +*/ + if (((1 type) type_mask) + ((1 curr_type) type_mask)) { Please inline definition of type_mask in the if, or rename it to wt_wb_mask and make it const.
Re: [PATCH 10/15] KVM: MTRR: sort variable MTRRs
On 06/01/2015 05:27 PM, Paolo Bonzini wrote: On 30/05/2015 12:59, Xiao Guangrong wrote: Sort all valid variable MTRRs based on its base address, it will help us to check a range to see if it's fully contained in variable MTRRs Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com --- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/mtrr.c | 39 +++ arch/x86/kvm/x86.c | 2 +- arch/x86/kvm/x86.h | 1 + 4 files changed, 44 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f3fc152..5be8f2e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -347,6 +347,7 @@ enum { struct kvm_mtrr_range { u64 base; u64 mask; + struct list_head node; }; struct kvm_mtrr { @@ -362,6 +363,8 @@ struct kvm_mtrr { unsigned mtrr_enabled:1; }; }; + + struct list_head head; }; struct kvm_vcpu_arch { diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c index aeb9767..8e3b81a 100644 --- a/arch/x86/kvm/mtrr.c +++ b/arch/x86/kvm/mtrr.c @@ -262,6 +262,38 @@ do_zap: kvm_zap_gfn_range(vcpu-kvm, gpa_to_gfn(start), gpa_to_gfn(end)); } +static bool var_mtrr_range_is_valid(struct kvm_mtrr_range *range) +{ + u64 start, end; + + if (!(range-mask (1 11))) + return false; + + var_mtrr_range(range, start, end); + return end start; +} + +static void set_var_mtrr_start(struct kvm_mtrr *mtrr_state, int index) +{ + /* remove the entry if it's in the list. */ + if (var_mtrr_range_is_valid(mtrr_state-var_ranges[index])) + list_del(mtrr_state-var_ranges[index].node); +} + +static void set_var_mtrr_end(struct kvm_mtrr *mtrr_state, int index) +{ + struct kvm_mtrr_range *tmp, *cur = mtrr_state-var_ranges[index]; + + /* add it to the list if it's valid. */ + if (var_mtrr_range_is_valid(mtrr_state-var_ranges[index])) { + list_for_each_entry(tmp, mtrr_state-head, node) + if (cur-base tmp-base) + list_add_tail(cur-node, tmp-node); + + list_add_tail(cur-node, mtrr_state-head); + } +} + int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data) { int index; @@ -281,10 +313,12 @@ int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data) index = (msr - 0x200) / 2; is_mtrr_mask = msr - 0x200 - 2 * index; + set_var_mtrr_start(vcpu-arch.mtrr_state, index); if (!is_mtrr_mask) vcpu-arch.mtrr_state.var_ranges[index].base = data; else vcpu-arch.mtrr_state.var_ranges[index].mask = data; + set_var_mtrr_end(vcpu-arch.mtrr_state, index); Just move the whole code to a new kvm_set_var_mtrr function. Okay. } update_mtrr(vcpu, msr); @@ -331,6 +365,11 @@ int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) return 0; } +void kvm_vcpu_mtrr_init(struct kvm_vcpu *vcpu) +{ + INIT_LIST_HEAD(vcpu-arch.mtrr_state.head); +} + u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) { struct kvm_mtrr *mtrr_state = vcpu-arch.mtrr_state; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 64c2891..8084ac3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6991,7 +6991,7 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) kvm_vcpu_reset(vcpu, false); kvm_mmu_setup(vcpu); vcpu_put(vcpu); - + kvm_vcpu_mtrr_init(vcpu); Please move this call much earlier. Okay, will do these in v2. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
On Thu, May 28, 2015 at 03:17:20PM +1000, Paul Mackerras wrote: This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. Signed-off-by: Paul Mackerras pau...@samba.org Reviewed-by: David Gibson da...@gibson.dropbear.id.au -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgpwPMMh5BtzP.pgp Description: PGP signature
Re: [PATCH 2/2] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
On Thu, May 28, 2015 at 03:17:20PM +1000, Paul Mackerras wrote: This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. Signed-off-by: Paul Mackerras pau...@samba.org Reviewed-by: David Gibson da...@gibson.dropbear.id.au -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgpgPtMaytBmL.pgp Description: PGP signature
Re: [PATCH 08/15] KVM: MTRR: introduce fixed_mtrr_segment table
On 06/01/2015 05:25 PM, Paolo Bonzini wrote: On 30/05/2015 12:59, Xiao Guangrong wrote: This table summarizes the information of fixed MTRRs and introduce some APIs to abstract its operation which helps us to clean up the code and will be used in later patches Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com --- arch/x86/kvm/mtrr.c | 191 ++-- 1 file changed, 142 insertions(+), 49 deletions(-) diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c index d3c06d2..888441e 100644 --- a/arch/x86/kvm/mtrr.c +++ b/arch/x86/kvm/mtrr.c @@ -102,12 +102,126 @@ bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data) } EXPORT_SYMBOL_GPL(kvm_mtrr_valid); +struct fixed_mtrr_segment { + u64 start; + u64 end; + + /* +* unit corresponds to the MSR entry in this segment, the size +* of unit is covered in one MSR. +*/ + u64 unit_size; + + /* a range is covered in one memory cache type. */ + u64 range_size; + + /* the start position in kvm_mtrr.fixed_ranges[]. */ + int range_start; +}; + +static struct fixed_mtrr_segment fixed_seg_table[] = { + /* MSR_MTRRfix64K_0, 1 unit. 64K fixed mtrr. */ + { + .start = 0x0, + .end = 0x8, + .unit_size = 0x8, Unit size is always range size * 8. Good catch, will drop this. + .range_size = 1ULL 16, Perhaps 64 * 1024 (and so on below) is clearer, because it matches the name of the MSR? Yes, it's better. + .range_start = 0, + }, + + /* +* MSR_MTRRfix16K_8 ... MSR_MTRRfix16K_A, 2 units, +* 16K fixed mtrr. +*/ + { + .start = 0x8, + .end = 0xc, + .unit_size = (0xc - 0x8) / 2, + .range_size = 1ULL 14, + .range_start = 8, + }, + + /* +* MSR_MTRRfix4K_C ... MSR_MTRRfix4K_F8000, 8 units, +* 4K fixed mtrr. +*/ + { + .start = 0xc, + .end = 0x10, + .unit_size = (0x10 - 0xc) / 8, + .range_size = 1ULL 12, + .range_start = 24, + } +}; + +static int fixed_msr_to_range_index(u32 msr) +{ + int seg, unit; + + if (!fixed_msr_to_seg_unit(msr, seg, unit)) + return -1; + + fixed_mtrr_seg_unit_range_index(seg, unit); This looks wrong. Oops, should be return fixed_mtrr_seg_unit_range_index(seg, unit); :( + return 0; +} + static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr) { struct kvm_mtrr *mtrr_state = vcpu-arch.mtrr_state; gfn_t start, end, mask; int index; - bool is_fixed = true; if (msr == MSR_IA32_CR_PAT || !tdp_enabled || !kvm_arch_has_noncoherent_dma(vcpu-kvm)) @@ -116,32 +230,19 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr) if (!mtrr_state-mtrr_enabled msr != MSR_MTRRdefType) return; + if (fixed_msr_to_range(msr, start, end)) { + if (!mtrr_state-fixed_mtrr_enabled) + return; + goto do_zap; + } + switch (msr) { Please move defType handling in an if above if (fixed_msr_to_range(msr, start, end)). Then you don't need the goto. Yes, indeed. Will do it in the next version. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/15] KVM: MTRR: introduce mtrr_for_each_mem_type
On 06/01/2015 05:33 PM, Paolo Bonzini wrote: On 30/05/2015 12:59, Xiao Guangrong wrote: +struct mtrr_looker { + /* input fields. */ + struct kvm_mtrr *mtrr_state; + u64 start; + u64 end; s/looker/iter/ or s/looker/lookup/ Good to me. +static void mtrr_lookup_start(struct mtrr_looker *looker) +{ + looker-mem_type = -1; + + if (!looker-mtrr_state-mtrr_enabled) { + looker-partial_map = true; + return; + } + + if (!mtrr_lookup_fixed_start(looker)) + mtrr_lookup_var_start(looker); +} + Separating mtrr_lookup_start and mtrr_lookup_init is weird. Agreed, will fold mtrr_lookup_start() into mtrr_lookup_init(). There are common parts of mtrr_lookup_*_start and mtrr_lookup_*_next. For example this: + looker-mem_type = looker-mtrr_state-fixed_ranges[index]; + looker-start = fixed_mtrr_range_end_addr(seg, index); + return true; in mtrr_lookup_fixed_start is the same as this: + end = fixed_mtrr_range_end_addr(looker-seg, looker-index); + + /* switch to next segment. */ + if (end = eseg-end) { + looker-seg++; + looker-index = 0; + + /* have looked up for all fixed MTRRs. */ + if (looker-seg = ARRAY_SIZE(fixed_seg_table)) + return mtrr_lookup_var_start(looker); + + end = fixed_mtrr_range_end_addr(looker-seg, looker-index); + } + + looker-mem_type = mtrr_state-fixed_ranges[looker-index]; + looker-start = end; in mtrr_lookup_fixed_next. Can you try to make them more common? Yes, i will check them carefully and introduce common functions to do it. Basically you should have +#define mtrr_for_each_mem_type(_looker_, _mtrr_, _gpa_start_, _gpa_end_) \ + for (mtrr_lookup_init(_looker_, _mtrr_, _gpa_start_, _gpa_end_); \ +!mtrr_lookup_end(_looker_); mtrr_lookup_next(_looker_)) Good idea, will do it in v2. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/15] KVM: MTRR: introduce mtrr_for_each_mem_type
On 06/01/2015 10:26 PM, Paolo Bonzini wrote: On 01/06/2015 11:33, Paolo Bonzini wrote: + looker-mem_type = looker-mtrr_state-fixed_ranges[index]; + looker-start = fixed_mtrr_range_end_addr(seg, index); + return true; in mtrr_lookup_fixed_start is the same as this: + end = fixed_mtrr_range_end_addr(looker-seg, looker-index); + + /* switch to next segment. */ + if (end = eseg-end) { + looker-seg++; + looker-index = 0; + + /* have looked up for all fixed MTRRs. */ + if (looker-seg = ARRAY_SIZE(fixed_seg_table)) + return mtrr_lookup_var_start(looker); + + end = fixed_mtrr_range_end_addr(looker-seg, looker-index); + } + + looker-mem_type = mtrr_state-fixed_ranges[looker-index]; + looker-start = end; in mtrr_lookup_fixed_next. Can you try to make them more common? Basically you should have +#define mtrr_for_each_mem_type(_looker_, _mtrr_, _gpa_start_, _gpa_end_) \ + for (mtrr_lookup_init(_looker_, _mtrr_, _gpa_start_, _gpa_end_); \ +!mtrr_lookup_end(_looker_); mtrr_lookup_next(_looker_)) ... where the above code I quoted would be part of mtrr_lookup_end. Okay, will do. :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range
On 06/01/2015 05:36 PM, Paolo Bonzini wrote: On 30/05/2015 12:59, Xiao Guangrong wrote: Currently guest MTRR is completely prohibited if cache snoop is supported on IOMMU (!noncoherent_dma) and host does the emulation based on the knowledge from host side, however, host side is not the good point to know what the purpose of guest is. A good example is that pass-throughed VGA frame buffer is not always UC as host expected Can you explain how? The original idea was that such a framebuffer would be kvm_is_reserved_pfn and thus be unconditionally UC. Yes, frame-buffer is always UC in current code, however, UC for frame-buffer causes bad performance. It's quoted from Documentation/x86/mtrr.txt: | This is most useful when you have a video (VGA) card on a PCI or AGP bus. | Enabling write-combining allows bus write transfers to be combined into a | larger transfer before bursting over the PCI/AGP bus. This can increase | performance of image write operations 2.5 times or more. So that guest will configure the range to MTRR, this patchset follows guest MTRR and cooperates with guest PAT (ept.VMX_EPT_IPAT_BIT = 0) to emulate guest cache type as guest expects. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time
[replying to myself] On Tue, Jun 02, 2015 at 11:27:59AM +0200, Christoffer Dall wrote: [..] If this patch is incorrect, then how does it work on x86, where handle_external_intr() is called (with a barrier in between) before kvm_guest_exit(), and where handle_external_intr() is simply local_irq_enable() on SVM and something more complicated on VMX ? Finally, how exactly is preemption latency added here? Won't IRQ processing run with higher priority than any task on your system, so the order of (1) process pending IRQs (2) call schedule if needed is still preserved here, but we call kvm_guest_exit() between (1) and (2) instead of before (1). I may be missing something, but on return from interrupt with preempt disabled we can't take the need resched path. And need to return to KVM no? preempt_enable() will call __preempt_schedule() and cause preemption there, so you're talking about adding these lines of latency: kvm_guest_exit(); trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu)); And these were called with interrupts disabled before, so I don't see the issue?? However, your question is making me think whether we have a race in the current code on fully preemptible kernels, if we get preempted before calling kvm_timer_sync_hwstate() and kvm_vgic_sync_hwstate(), then we could potentially schedule another vcpu on this core and loose/corrupt state, can we not? We probably need to check for this in kvm_vcpu_load/kvm_vcpu_put. I need to think more about if this is a real issue or if I'm seeing ghosts. I've thought about it and I don't think there's a race because those functions don't access the hardware directly, but only manipulate per-vcpu data structures. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: PPC: Book3S HV: Make use of unused threads when running guests
On Thu, May 28, 2015 at 03:17:19PM +1000, Paul Mackerras wrote: When running a virtual core of a guest that is configured with fewer threads per core than the physical cores have, the extra physical threads are currently unused. This makes it possible to use them to run one or more other virtual cores from the same guest when certain conditions are met. This applies on POWER7, and on POWER8 to guests with one thread per virtual core. (It doesn't apply to POWER8 guests with multiple threads per vcore because they require a 1-1 virtual to physical thread mapping in order to be able to use msgsndp and the TIR.) The idea is that we maintain a list of preempted vcores for each physical cpu (i.e. each core, since the host runs single-threaded). Then, when a vcore is about to run, it checks to see if there are any vcores on the list for its physical cpu that could be piggybacked onto this vcore's execution. If so, those additional vcores are put into state VCORE_PIGGYBACK and their runnable VCPU threads are started as well as the original vcore, which is called the master vcore. After the vcores have exited the guest, the extra ones are put back onto the preempted list if any of their VCPUs are still runnable and not idle. This means that vcpu-arch.ptid is no longer necessarily the same as the physical thread that the vcpu runs on. In order to make it easier for code that wants to send an IPI to know which CPU to target, we now store that in a new field in struct vcpu_arch, called thread_cpu. Signed-off-by: Paul Mackerras pau...@samba.org Reviewed-by: David Gibson da...@gibson.dropbear.id.au -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgpGb5ERdypMG.pgp Description: PGP signature
Re: [PATCH 1/2] KVM: PPC: Book3S HV: Make use of unused threads when running guests
On Thu, May 28, 2015 at 03:17:19PM +1000, Paul Mackerras wrote: When running a virtual core of a guest that is configured with fewer threads per core than the physical cores have, the extra physical threads are currently unused. This makes it possible to use them to run one or more other virtual cores from the same guest when certain conditions are met. This applies on POWER7, and on POWER8 to guests with one thread per virtual core. (It doesn't apply to POWER8 guests with multiple threads per vcore because they require a 1-1 virtual to physical thread mapping in order to be able to use msgsndp and the TIR.) The idea is that we maintain a list of preempted vcores for each physical cpu (i.e. each core, since the host runs single-threaded). Then, when a vcore is about to run, it checks to see if there are any vcores on the list for its physical cpu that could be piggybacked onto this vcore's execution. If so, those additional vcores are put into state VCORE_PIGGYBACK and their runnable VCPU threads are started as well as the original vcore, which is called the master vcore. After the vcores have exited the guest, the extra ones are put back onto the preempted list if any of their VCPUs are still runnable and not idle. This means that vcpu-arch.ptid is no longer necessarily the same as the physical thread that the vcpu runs on. In order to make it easier for code that wants to send an IPI to know which CPU to target, we now store that in a new field in struct vcpu_arch, called thread_cpu. Signed-off-by: Paul Mackerras pau...@samba.org Reviewed-by: David Gibson da...@gibson.dropbear.id.au -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgprmehfWy_o1.pgp Description: PGP signature