Re: [PATCH 00/14] KVM: PPC: Book3S HV: PCI Passthrough Interrupt Optimizations
On 02/26/2016 12:40 PM, Suresh Warrier wrote: > This patch set adds support for handling interrupts for PCI adapters > entirely in the guest under the right conditions. When an interrupt > is received by KVM in real mode, if the interrupt is from a PCI > passthrough adapter owned by the guest, KVM will update the virtual > ICP for the VCPU that is the target of the interrupt entirely in > real mode and generate the virtual interrupt. If the VCPU is not > running in the guest, it will wake up the VCPU. It will also update > the affinity of the interrupt to directly target the CPU (core) > where this VCPU is being scheduled as an optimization. > > KVM needs the mapping between hardware interrupt numbers in the host > to the virtual hardware interrupt (GSI) that needs to get injected > into the guest. This patch set takes advantage of the IRQ bypass > manager feature to create this mapping. For now, we allocate and > manage a separate mapping structure per VM. > > Although a mapping is created for every passthrough IRQ requested > in the guest, we also maintain a cache of mappings that is used to > speed up search. For now, KVM real mode code only looks in the cache for > a mapping. If no mapping is found, we fall back on the usual interrupt > routing mechanism - switch back to host and run the VFIO interrupt > handler. > > This is based on 4.5-rc1 plus the patch set in > http://www.spinics.net/lists/kvm-ppc/msg11131.html since it has > dependencies on vmalloc_to_phys() being public. > > Suresh Warrier (14): > powerpc: Add simple cache inhibited MMIO accessors > KVM: PPC: Book3S HV: Convert kvmppc_read_intr to a C function > KVM: PPC: select IRQ_BYPASS_MANAGER > KVM: PPC: Book3S HV: Introduce kvmppc_passthru_irqmap > KVM: PPC: Book3S HV: Enable IRQ bypass > KVM: PPC: Book3S HV: Caching for passthrough IRQ map > KVM: PPC: Book3S HV: Handle passthrough interrupts in guest > KVM: PPC: Book3S HV: Complete passthrough interrupt in host > KVM: PPC: Book3S HV: Enable KVM real mode handling of passthrough IRQs > KVM: PPC: Book3S HV: Dump irqmap in debugfs > KVM: PPC: Book3S HV: Tunable to disable KVM IRQ bypass > KVM: PPC: Book3S HV: Update irq stats for IRQs handled in real mode > KVM: PPC: Book3S HV: Change affinity for passthrough IRQ > KVM: PPC: Book3S HV: Counters for passthrough IRQ stats > > arch/powerpc/include/asm/io.h | 28 +++ > arch/powerpc/include/asm/kvm_asm.h| 10 + > arch/powerpc/include/asm/kvm_book3s.h | 1 + > arch/powerpc/include/asm/kvm_host.h | 25 +++ > arch/powerpc/include/asm/kvm_ppc.h| 28 +++ > arch/powerpc/include/asm/pnv-pci.h| 1 + > arch/powerpc/kvm/Kconfig | 2 + > arch/powerpc/kvm/book3s.c | 45 + > arch/powerpc/kvm/book3s_hv.c | 318 > +- > arch/powerpc/kvm/book3s_hv_builtin.c | 157 +++ > arch/powerpc/kvm/book3s_hv_rm_xics.c | 181 + > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 226 - > arch/powerpc/kvm/book3s_xics.c| 68 ++- > arch/powerpc/kvm/book3s_xics.h| 3 + > arch/powerpc/platforms/powernv/pci-ioda.c | 14 +- > 15 files changed, 1013 insertions(+), 94 deletions(-) > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 8/9] KVM: PPC: Book3S HV: Send IPI to host core to wake VCPU
This patch adds support to real-mode KVM to search for a core running in the host partition and send it an IPI message with VCPU to be woken. This avoids having to switch to the host partition to complete an H_IPI hypercall when the VCPU which is the target of the the H_IPI is not loaded (is not running in the guest). The patch also includes the support in the IPI handler running in the host to do the wakeup by calling kvmppc_xics_ipi_action for the PPC_MSG_RM_HOST_ACTION message. When a guest is being destroyed, we need to ensure that there are no pending IPIs waiting to wake up a VCPU before we free the VCPUs of the guest. This is accomplished by: - Forces a PPC_MSG_CALL_FUNCTION IPI to be completed by all CPUs before freeing any VCPUs in kvm_arch_destroy_vm(). - Any PPC_MSG_RM_HOST_ACTION messages must be executed first before any other PPC_MSG_CALL_FUNCTION messages. Signed-off-by: Suresh Warrier--- Fixed build break for CONFIG_SMP=n (thanks to Mike Ellerman for pointing that out). arch/powerpc/kernel/smp.c| 11 + arch/powerpc/kvm/book3s_hv_rm_xics.c | 92 ++-- arch/powerpc/kvm/powerpc.c | 10 3 files changed, 110 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index e222efc..cb8be5d 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -257,6 +257,17 @@ irqreturn_t smp_ipi_demux(void) do { all = xchg(>messages, 0); +#if defined(CONFIG_KVM_XICS) && defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE) + /* +* Must check for PPC_MSG_RM_HOST_ACTION messages +* before PPC_MSG_CALL_FUNCTION messages because when +* a VM is destroyed, we call kick_all_cpus_sync() +* to ensure that any pending PPC_MSG_RM_HOST_ACTION +* messages have completed before we free any VCPUs. +*/ + if (all & IPI_MESSAGE(PPC_MSG_RM_HOST_ACTION)) + kvmppc_xics_ipi_action(); +#endif if (all & IPI_MESSAGE(PPC_MSG_CALL_FUNCTION)) generic_smp_call_function_interrupt(); if (all & IPI_MESSAGE(PPC_MSG_RESCHEDULE)) diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index 43ffbfe..e673fb9 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -51,11 +51,84 @@ static void ics_rm_check_resend(struct kvmppc_xics *xics, /* -- ICP routines -- */ +#ifdef CONFIG_SMP +static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu) +{ + int hcpu; + + hcpu = hcore << threads_shift; + kvmppc_host_rm_ops_hv->rm_core[hcore].rm_data = vcpu; + smp_muxed_ipi_set_message(hcpu, PPC_MSG_RM_HOST_ACTION); + icp_native_cause_ipi_rm(hcpu); +} +#else +static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu) { } +#endif + +/* + * We start the search from our current CPU Id in the core map + * and go in a circle until we get back to our ID looking for a + * core that is running in host context and that hasn't already + * been targeted for another rm_host_ops. + * + * In the future, could consider using a fairer algorithm (one + * that distributes the IPIs better) + * + * Returns -1, if no CPU could be found in the host + * Else, returns a CPU Id which has been reserved for use + */ +static inline int grab_next_hostcore(int start, + struct kvmppc_host_rm_core *rm_core, int max, int action) +{ + bool success; + int core; + union kvmppc_rm_state old, new; + + for (core = start + 1; core < max; core++) { + old = new = READ_ONCE(rm_core[core].rm_state); + + if (!old.in_host || old.rm_action) + continue; + + /* Try to grab this host core if not taken already. */ + new.rm_action = action; + + success = cmpxchg64(_core[core].rm_state.raw, + old.raw, new.raw) == old.raw; + if (success) { + /* +* Make sure that the store to the rm_action is made +* visible before we return to caller (and the +* subsequent store to rm_data) to synchronize with +* the IPI handler. +*/ + smp_wmb(); + return core; + } + } + + return -1; +} + +static inline int find_available_hostcore(int action) +{ + int core; + int my_core = smp_processor_id() >> threads_shift; + struct kvmppc_host_rm_core *rm_core = kvmppc_host_rm_ops_hv->rm_core; + + core = grab_next_hostcore(my_core, rm_core, cpu_nr_cores(), action); + if (core == -1) + core =
Re: [PATCH v3 9/9] KVM: PPC: Book3S HV: Add tunable to control H_IPI redirection
Redirecting the wakeup of a VCPU from the H_IPI hypercall to a core running in the host is usually a good idea, most workloads seemed to benefit. However, in one heavily interrupt-driven SMT1 workload, some regression was observed. This patch adds a kvm_hv module parameter called h_ipi_redirect to control this feature. The default value for this tunable is 1 - that is enable the feature. Signed-off-by: Suresh Warrier--- Resending the updated patch with the updated diff since an earlier patch (patch 8/9) had to be resent to fix a build break. arch/powerpc/include/asm/kvm_ppc.h | 1 + arch/powerpc/kvm/book3s_hv.c | 11 +++ arch/powerpc/kvm/book3s_hv_rm_xics.c | 5 - 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 1b93519..29d1442 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -448,6 +448,7 @@ extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval); extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev, struct kvm_vcpu *vcpu, u32 cpu); extern void kvmppc_xics_ipi_action(void); +extern int h_ipi_redirect; #else static inline void kvmppc_alloc_host_rm_ops(void) {}; static inline void kvmppc_free_host_rm_ops(void) {}; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index d6280ed..182ec84 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -81,6 +81,17 @@ static int target_smt_mode; module_param(target_smt_mode, int, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(target_smt_mode, "Target threads per core (0 = max)"); +#ifdef CONFIG_KVM_XICS +static struct kernel_param_ops module_param_ops = { + .set = param_set_int, + .get = param_get_int, +}; + +module_param_cb(h_ipi_redirect, _param_ops, _ipi_redirect, + S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(h_ipi_redirect, "Redirect H_IPI wakeup to a free host core"); +#endif + static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index e673fb9..980d8a6 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -24,6 +24,9 @@ #define DEBUG_PASSUP +int h_ipi_redirect = 1; +EXPORT_SYMBOL(h_ipi_redirect); + static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, u32 new_irq); @@ -148,7 +151,7 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu, cpu = vcpu->arch.thread_cpu; if (cpu < 0 || cpu >= nr_cpu_ids) { hcore = -1; - if (kvmppc_host_rm_ops_hv) + if (kvmppc_host_rm_ops_hv && h_ipi_redirect) hcore = find_available_hostcore(XICS_RM_KICK_VCPU); if (hcore != -1) { icp_send_hcore_msg(hcore, vcpu); -- 1.8.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [2/2] powerpc/smp: Add smp_muxed_ipi_rm_message_pass
Hi Mike, After looking at this a little more, I think it would perhaps be better to define the real-mode function that causes IPI in book3s_hv_rm_xics.c along with other real-mode functions that operate on the xics. Hope this is acceptable to you. If not, we can discuss when I re-submit the patch. Thanks. -suresh On 11/16/2015 03:34 PM, Suresh E. Warrier wrote: > Hi Mike, > > The changes you proposed look nicer than what I have here. > I will get that coded and tested and re=submit. > > Thanks. > -suresh > > On 11/15/2015 11:53 PM, Michael Ellerman wrote: >> Hi Suresh, >> >> On Thu, 2015-29-10 at 23:40:45 UTC, "Suresh E. Warrier" wrote: >>> This function supports IPI message passing for real >>> mode callers. >>> >>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c >>> index a53a130..8c07bfad 100644 >>> --- a/arch/powerpc/kernel/smp.c >>> +++ b/arch/powerpc/kernel/smp.c >>> @@ -235,6 +238,33 @@ void smp_muxed_ipi_message_pass(int cpu, int msg) >>> smp_ops->cause_ipi(cpu, info->data); >>> } >>> >>> +#if defined(CONFIG_KVM_XICS) && defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE) >>> +/* >>> + * Message passing code for real mode callers. It does not use the >>> + * smp_ops->cause_ipi function to cause an IPI, because those functions >>> + * access the MFFR through an ioremapped address. >>> + */ >>> +void smp_muxed_ipi_rm_message_pass(int cpu, int msg) >>> +{ >>> + struct cpu_messages *info = _cpu(ipi_message, cpu); >>> + char *message = (char *)>messages; >>> + unsigned long xics_phys; >>> + >>> + /* >>> +* Order previous accesses before accesses in the IPI handler. >>> +*/ >>> + smp_mb(); >>> + message[msg] = 1; >>> + >>> + /* >>> +* cause_ipi functions are required to include a full barrier >>> +* before doing whatever causes the IPI. >>> +*/ >>> + xics_phys = paca[cpu].kvm_hstate.xics_phys; >>> + out_rm8((u8 *)(xics_phys + XICS_MFRR), IPI_PRIORITY); >>> +} >>> +#endif >> >> >> I'm not all that happy with this. This function does two things, one of which >> belongs in this file (setting message), and the other which definitely does >> not (the XICs part). >> >> I think the end result would be cleaner if we did something like: >> >> void smp_muxed_ipi_set_message(int cpu, int msg) >> { >> struct cpu_messages *info = _cpu(ipi_message, cpu); >> char *message = (char *)>messages; >> unsigned long xics_phys; >> >> /* >> * Order previous accesses before accesses in the IPI handler. >> */ >> smp_mb(); >> message[msg] = 1; >> } >> >> Which would be exported, and could also be used by >> smp_muxed_ipi_message_pass(). >> >> Then in icp_rm_set_vcpu_irq(), you would do something like: >> >> if (hcore != -1) { >> hcpu = hcore << threads_shift; >> kvmppc_host_rm_ops_hv->rm_core[hcore].rm_data = vcpu; >> smp_muxed_ipi_set_message(hcpu, PPC_MSG_RM_HOST_ACTION); >> icp_native_cause_ipi_real_mode(); >> } >> >> Where icp_native_cause_ipi_real_mode() is a new hook you define in >> icp_native.c >> which does the real mode write to MFRR. >> >> cheers >> ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [2/2] powerpc/smp: Add smp_muxed_ipi_rm_message_pass
Hi Mike, The changes you proposed look nicer than what I have here. I will get that coded and tested and re=submit. Thanks. -suresh On 11/15/2015 11:53 PM, Michael Ellerman wrote: > Hi Suresh, > > On Thu, 2015-29-10 at 23:40:45 UTC, "Suresh E. Warrier" wrote: >> This function supports IPI message passing for real >> mode callers. >> >> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c >> index a53a130..8c07bfad 100644 >> --- a/arch/powerpc/kernel/smp.c >> +++ b/arch/powerpc/kernel/smp.c >> @@ -235,6 +238,33 @@ void smp_muxed_ipi_message_pass(int cpu, int msg) >> smp_ops->cause_ipi(cpu, info->data); >> } >> >> +#if defined(CONFIG_KVM_XICS) && defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE) >> +/* >> + * Message passing code for real mode callers. It does not use the >> + * smp_ops->cause_ipi function to cause an IPI, because those functions >> + * access the MFFR through an ioremapped address. >> + */ >> +void smp_muxed_ipi_rm_message_pass(int cpu, int msg) >> +{ >> +struct cpu_messages *info = _cpu(ipi_message, cpu); >> +char *message = (char *)>messages; >> +unsigned long xics_phys; >> + >> +/* >> + * Order previous accesses before accesses in the IPI handler. >> + */ >> +smp_mb(); >> +message[msg] = 1; >> + >> +/* >> + * cause_ipi functions are required to include a full barrier >> + * before doing whatever causes the IPI. >> + */ >> +xics_phys = paca[cpu].kvm_hstate.xics_phys; >> +out_rm8((u8 *)(xics_phys + XICS_MFRR), IPI_PRIORITY); >> +} >> +#endif > > > I'm not all that happy with this. This function does two things, one of which > belongs in this file (setting message), and the other which definitely does > not (the XICs part). > > I think the end result would be cleaner if we did something like: > > void smp_muxed_ipi_set_message(int cpu, int msg) > { > struct cpu_messages *info = _cpu(ipi_message, cpu); > char *message = (char *)>messages; > unsigned long xics_phys; > > /* >* Order previous accesses before accesses in the IPI handler. >*/ > smp_mb(); > message[msg] = 1; > } > > Which would be exported, and could also be used by > smp_muxed_ipi_message_pass(). > > Then in icp_rm_set_vcpu_irq(), you would do something like: > > if (hcore != -1) { > hcpu = hcore << threads_shift; > kvmppc_host_rm_ops_hv->rm_core[hcore].rm_data = vcpu; > smp_muxed_ipi_set_message(hcpu, PPC_MSG_RM_HOST_ACTION); > icp_native_cause_ipi_real_mode(); > } > > Where icp_native_cause_ipi_real_mode() is a new hook you define in > icp_native.c > which does the real mode write to MFRR. > > cheers > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Export __spin_yield
Export __spin_yield so that the arch_spin_unlock() function can be invoked from a module. This will be required for modules where we want to take a lock that is also is acquired in hypervisor real mode. Because we want to avoid running any lockdep code (which may not be safe in real mode), this lock needs to be an arch_spinlock_t instead of a normal spinlock. Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com --- Replaced export to EXPORT_SYMBOL_GPL Updated commit log to explain what kind of modules will need to use the arch_spin_unlock() function arch/powerpc/lib/locks.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c index bb7cfec..f31bcee 100644 --- a/arch/powerpc/lib/locks.c +++ b/arch/powerpc/lib/locks.c @@ -41,6 +41,7 @@ void __spin_yield(arch_spinlock_t *lock) plpar_hcall_norets(H_CONFER, get_hard_smp_processor_id(holder_cpu), yield_count); } +EXPORT_SYMBOL_GPL(__spin_yield); /* * Waiting for a read lock or a write lock on a rwlock... -- 1.8.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Export __spin_yield
On 02/23/2015 09:38 PM, Benjamin Herrenschmidt wrote: On Mon, 2015-02-23 at 18:10 -0600, Suresh E. Warrier wrote: Export __spin_yield so that the arch_spin_unlock() function can be invoked from a module. Make it EXPORT_SYMBOL_GPL. Also explain why a module might need it Sure, I will change that to EXPORT_SYMBOL_GPL. Just curious, though, there is another symbol arch_spin_unlock_wait that is exported from the file without the _GPL prefix. Any idea why? I have mentioned that this needs to be exported to call the arch_spin_unlock() function from a module. What additional information do you think will be useful here ? Are you looking at something that explains why a module might need to call arch_spin_unlock()? Thanks. -suresh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Export __spin_yield
Export __spin_yield so that the arch_spin_unlock() function can be invoked from a module. Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com --- arch/powerpc/lib/locks.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c index bb7cfec..d100de8 100644 --- a/arch/powerpc/lib/locks.c +++ b/arch/powerpc/lib/locks.c @@ -41,6 +41,7 @@ void __spin_yield(arch_spinlock_t *lock) plpar_hcall_norets(H_CONFER, get_hard_smp_processor_id(holder_cpu), yield_count); } +EXPORT_SYMBOL(__spin_yield); /* * Waiting for a read lock or a write lock on a rwlock... -- 1.8.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev