Re: [PATCH 00/14] KVM: PPC: Book3S HV: PCI Passthrough Interrupt Optimizations

2016-02-26 Thread Suresh E. Warrier


On 02/26/2016 12:40 PM, Suresh Warrier wrote:
> This patch set adds support for handling interrupts for PCI adapters
> entirely in the guest under the right conditions. When an interrupt
> is received by KVM in real mode, if the interrupt is from a PCI
> passthrough adapter owned by the guest, KVM will update the virtual
> ICP for the VCPU that is the target of the interrupt entirely in
> real mode and generate the virtual interrupt. If the VCPU is not
> running in the guest, it will wake up the VCPU.  It will also update
> the affinity of the interrupt to directly target the CPU (core)
> where this VCPU is being scheduled as an optimization. 
> 
> KVM needs the mapping between hardware interrupt numbers in the host
> to the virtual hardware interrupt (GSI) that needs to get injected
> into the guest. This patch set takes advantage of the IRQ bypass
> manager feature to create this mapping. For now, we allocate and
> manage a separate mapping structure per VM.
> 
> Although a mapping is created for every passthrough IRQ requested
> in the guest, we also maintain a cache of mappings that is used to
> speed up search. For now, KVM real mode code only looks in the cache for
> a mapping. If no mapping is found, we fall back on the usual interrupt
> routing mechanism - switch back to host and run the VFIO interrupt
> handler.
> 
> This is based on 4.5-rc1 plus the patch set in
> http://www.spinics.net/lists/kvm-ppc/msg11131.html since it has
> dependencies on vmalloc_to_phys() being public.
> 
> Suresh Warrier (14):
>   powerpc: Add simple cache inhibited MMIO accessors
>   KVM: PPC: Book3S HV: Convert kvmppc_read_intr to a C function
>   KVM: PPC: select IRQ_BYPASS_MANAGER
>   KVM: PPC: Book3S HV: Introduce kvmppc_passthru_irqmap
>   KVM: PPC: Book3S HV: Enable IRQ bypass
>   KVM: PPC: Book3S HV: Caching for passthrough IRQ map
>   KVM: PPC: Book3S HV: Handle passthrough interrupts in guest
>   KVM: PPC: Book3S HV: Complete passthrough interrupt in host
>   KVM: PPC: Book3S HV: Enable KVM real mode handling of passthrough IRQs
>   KVM: PPC: Book3S HV: Dump irqmap in debugfs
>   KVM: PPC: Book3S HV: Tunable to disable KVM IRQ bypass
>   KVM: PPC: Book3S HV: Update irq stats for IRQs handled in real mode
>   KVM: PPC: Book3S HV: Change affinity for passthrough IRQ
>   KVM: PPC: Book3S HV: Counters for passthrough IRQ stats
> 
>  arch/powerpc/include/asm/io.h |  28 +++
>  arch/powerpc/include/asm/kvm_asm.h|  10 +
>  arch/powerpc/include/asm/kvm_book3s.h |   1 +
>  arch/powerpc/include/asm/kvm_host.h   |  25 +++
>  arch/powerpc/include/asm/kvm_ppc.h|  28 +++
>  arch/powerpc/include/asm/pnv-pci.h|   1 +
>  arch/powerpc/kvm/Kconfig  |   2 +
>  arch/powerpc/kvm/book3s.c |  45 +
>  arch/powerpc/kvm/book3s_hv.c  | 318 
> +-
>  arch/powerpc/kvm/book3s_hv_builtin.c  | 157 +++
>  arch/powerpc/kvm/book3s_hv_rm_xics.c  | 181 +
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 226 -
>  arch/powerpc/kvm/book3s_xics.c|  68 ++-
>  arch/powerpc/kvm/book3s_xics.h|   3 +
>  arch/powerpc/platforms/powernv/pci-ioda.c |  14 +-
>  15 files changed, 1013 insertions(+), 94 deletions(-)
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 8/9] KVM: PPC: Book3S HV: Send IPI to host core to wake VCPU

2015-12-21 Thread Suresh E. Warrier
This patch adds support to real-mode KVM to search for a core
running in the host partition and send it an IPI message with
VCPU to be woken. This avoids having to switch to the host
partition to complete an H_IPI hypercall when the VCPU which
is the target of the the H_IPI is not loaded (is not running
in the guest).

The patch also includes the support in the IPI handler running
in the host to do the wakeup by calling kvmppc_xics_ipi_action
for the PPC_MSG_RM_HOST_ACTION message.

When a guest is being destroyed, we need to ensure that there
are no pending IPIs waiting to wake up a VCPU before we free
the VCPUs of the guest. This is accomplished by:
- Forces a PPC_MSG_CALL_FUNCTION IPI to be completed by all CPUs
  before freeing any VCPUs in kvm_arch_destroy_vm().
- Any PPC_MSG_RM_HOST_ACTION messages must be executed first
  before any other PPC_MSG_CALL_FUNCTION messages.

Signed-off-by: Suresh Warrier 
---
Fixed build break for CONFIG_SMP=n (thanks to Mike Ellerman for
pointing that out).

 arch/powerpc/kernel/smp.c| 11 +
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 92 ++--
 arch/powerpc/kvm/powerpc.c   | 10 
 3 files changed, 110 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index e222efc..cb8be5d 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -257,6 +257,17 @@ irqreturn_t smp_ipi_demux(void)
 
do {
all = xchg(>messages, 0);
+#if defined(CONFIG_KVM_XICS) && defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE)
+   /*
+* Must check for PPC_MSG_RM_HOST_ACTION messages
+* before PPC_MSG_CALL_FUNCTION messages because when
+* a VM is destroyed, we call kick_all_cpus_sync()
+* to ensure that any pending PPC_MSG_RM_HOST_ACTION
+* messages have completed before we free any VCPUs.
+*/
+   if (all & IPI_MESSAGE(PPC_MSG_RM_HOST_ACTION))
+   kvmppc_xics_ipi_action();
+#endif
if (all & IPI_MESSAGE(PPC_MSG_CALL_FUNCTION))
generic_smp_call_function_interrupt();
if (all & IPI_MESSAGE(PPC_MSG_RESCHEDULE))
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 43ffbfe..e673fb9 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -51,11 +51,84 @@ static void ics_rm_check_resend(struct kvmppc_xics *xics,
 
 /* -- ICP routines -- */
 
+#ifdef CONFIG_SMP
+static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu)
+{
+   int hcpu;
+
+   hcpu = hcore << threads_shift;
+   kvmppc_host_rm_ops_hv->rm_core[hcore].rm_data = vcpu;
+   smp_muxed_ipi_set_message(hcpu, PPC_MSG_RM_HOST_ACTION);
+   icp_native_cause_ipi_rm(hcpu);
+}
+#else
+static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu) { }
+#endif
+
+/*
+ * We start the search from our current CPU Id in the core map
+ * and go in a circle until we get back to our ID looking for a
+ * core that is running in host context and that hasn't already
+ * been targeted for another rm_host_ops.
+ *
+ * In the future, could consider using a fairer algorithm (one
+ * that distributes the IPIs better)
+ *
+ * Returns -1, if no CPU could be found in the host
+ * Else, returns a CPU Id which has been reserved for use
+ */
+static inline int grab_next_hostcore(int start,
+   struct kvmppc_host_rm_core *rm_core, int max, int action)
+{
+   bool success;
+   int core;
+   union kvmppc_rm_state old, new;
+
+   for (core = start + 1; core < max; core++)  {
+   old = new = READ_ONCE(rm_core[core].rm_state);
+
+   if (!old.in_host || old.rm_action)
+   continue;
+
+   /* Try to grab this host core if not taken already. */
+   new.rm_action = action;
+
+   success = cmpxchg64(_core[core].rm_state.raw,
+   old.raw, new.raw) == old.raw;
+   if (success) {
+   /*
+* Make sure that the store to the rm_action is made
+* visible before we return to caller (and the
+* subsequent store to rm_data) to synchronize with
+* the IPI handler.
+*/
+   smp_wmb();
+   return core;
+   }
+   }
+
+   return -1;
+}
+
+static inline int find_available_hostcore(int action)
+{
+   int core;
+   int my_core = smp_processor_id() >> threads_shift;
+   struct kvmppc_host_rm_core *rm_core = kvmppc_host_rm_ops_hv->rm_core;
+
+   core = grab_next_hostcore(my_core, rm_core, cpu_nr_cores(), action);
+   if (core == -1)
+   core = 

Re: [PATCH v3 9/9] KVM: PPC: Book3S HV: Add tunable to control H_IPI redirection

2015-12-21 Thread Suresh E. Warrier
Redirecting the wakeup of a VCPU from the H_IPI hypercall to
a core running in the host is usually a good idea, most workloads
seemed to benefit. However, in one heavily interrupt-driven SMT1
workload, some regression was observed. This patch adds a kvm_hv
module parameter called h_ipi_redirect to control this feature.

The default value for this tunable is 1 - that is enable the feature.

Signed-off-by: Suresh Warrier 
---
Resending the updated patch with the updated diff since an 
earlier patch (patch 8/9) had to be resent to fix a build
break.

 arch/powerpc/include/asm/kvm_ppc.h   |  1 +
 arch/powerpc/kvm/book3s_hv.c | 11 +++
 arch/powerpc/kvm/book3s_hv_rm_xics.c |  5 -
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 1b93519..29d1442 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -448,6 +448,7 @@ extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 
icpval);
 extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
struct kvm_vcpu *vcpu, u32 cpu);
 extern void kvmppc_xics_ipi_action(void);
+extern int h_ipi_redirect;
 #else
 static inline void kvmppc_alloc_host_rm_ops(void) {};
 static inline void kvmppc_free_host_rm_ops(void) {};
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d6280ed..182ec84 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -81,6 +81,17 @@ static int target_smt_mode;
 module_param(target_smt_mode, int, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(target_smt_mode, "Target threads per core (0 = max)");
 
+#ifdef CONFIG_KVM_XICS
+static struct kernel_param_ops module_param_ops = {
+   .set = param_set_int,
+   .get = param_get_int,
+};
+
+module_param_cb(h_ipi_redirect, _param_ops, _ipi_redirect,
+   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(h_ipi_redirect, "Redirect H_IPI wakeup to a free host core");
+#endif
+
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index e673fb9..980d8a6 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -24,6 +24,9 @@
 
 #define DEBUG_PASSUP
 
+int h_ipi_redirect = 1;
+EXPORT_SYMBOL(h_ipi_redirect);
+
 static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
u32 new_irq);
 
@@ -148,7 +151,7 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
cpu = vcpu->arch.thread_cpu;
if (cpu < 0 || cpu >= nr_cpu_ids) {
hcore = -1;
-   if (kvmppc_host_rm_ops_hv)
+   if (kvmppc_host_rm_ops_hv && h_ipi_redirect)
hcore = find_available_hostcore(XICS_RM_KICK_VCPU);
if (hcore != -1) {
icp_send_hcore_msg(hcore, vcpu);
-- 
1.8.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [2/2] powerpc/smp: Add smp_muxed_ipi_rm_message_pass

2015-11-25 Thread Suresh E. Warrier
Hi Mike,

After looking at this a little more, I think it would perhaps
be better to define the real-mode function that causes IPI in 
book3s_hv_rm_xics.c along with other real-mode functions that 
operate on the xics.

Hope this is acceptable to you. If not, we can discuss when
I re-submit the patch.

Thanks.
-suresh


On 11/16/2015 03:34 PM, Suresh E. Warrier wrote:
> Hi Mike,
> 
> The changes you proposed look nicer than what I have here.
> I will get that coded and tested and re=submit.
> 
> Thanks.
> -suresh
> 
> On 11/15/2015 11:53 PM, Michael Ellerman wrote:
>> Hi Suresh,
>>
>> On Thu, 2015-29-10 at 23:40:45 UTC, "Suresh E. Warrier" wrote:
>>> This function supports IPI message passing for real
>>> mode callers.
>>>
>>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>>> index a53a130..8c07bfad 100644
>>> --- a/arch/powerpc/kernel/smp.c
>>> +++ b/arch/powerpc/kernel/smp.c
>>> @@ -235,6 +238,33 @@ void smp_muxed_ipi_message_pass(int cpu, int msg)
>>> smp_ops->cause_ipi(cpu, info->data);
>>>  }
>>>  
>>> +#if defined(CONFIG_KVM_XICS) && defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE)
>>> +/*
>>> + * Message passing code for real mode callers. It does not use the
>>> + * smp_ops->cause_ipi function to cause an IPI, because those functions
>>> + * access the MFFR through an ioremapped address.
>>> + */
>>> +void smp_muxed_ipi_rm_message_pass(int cpu, int msg)
>>> +{
>>> +   struct cpu_messages *info = _cpu(ipi_message, cpu);
>>> +   char *message = (char *)>messages;
>>> +   unsigned long xics_phys;
>>> +
>>> +   /*
>>> +* Order previous accesses before accesses in the IPI handler.
>>> +*/
>>> +   smp_mb();
>>> +   message[msg] = 1;
>>> +
>>> +   /*
>>> +* cause_ipi functions are required to include a full barrier
>>> +* before doing whatever causes the IPI.
>>> +*/
>>> +   xics_phys = paca[cpu].kvm_hstate.xics_phys;
>>> +   out_rm8((u8 *)(xics_phys + XICS_MFRR), IPI_PRIORITY);
>>> +}
>>> +#endif
>>
>>
>> I'm not all that happy with this. This function does two things, one of which
>> belongs in this file (setting message), and the other which definitely does
>> not (the XICs part).
>>
>> I think the end result would be cleaner if we did something like:
>>
>> void smp_muxed_ipi_set_message(int cpu, int msg)
>> {
>>  struct cpu_messages *info = _cpu(ipi_message, cpu);
>>  char *message = (char *)>messages;
>>  unsigned long xics_phys;
>>
>>  /*
>>   * Order previous accesses before accesses in the IPI handler.
>>   */
>>  smp_mb();
>>  message[msg] = 1;
>> }
>>
>> Which would be exported, and could also be used by 
>> smp_muxed_ipi_message_pass().
>>
>> Then in icp_rm_set_vcpu_irq(), you would do something like:
>>
>>  if (hcore != -1) {
>>  hcpu = hcore << threads_shift;
>>  kvmppc_host_rm_ops_hv->rm_core[hcore].rm_data = vcpu;
>>  smp_muxed_ipi_set_message(hcpu, PPC_MSG_RM_HOST_ACTION);
>>  icp_native_cause_ipi_real_mode();
>>  }
>>
>> Where icp_native_cause_ipi_real_mode() is a new hook you define in 
>> icp_native.c
>> which does the real mode write to MFRR.
>>
>> cheers
>>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [2/2] powerpc/smp: Add smp_muxed_ipi_rm_message_pass

2015-11-16 Thread Suresh E. Warrier
Hi Mike,

The changes you proposed look nicer than what I have here.
I will get that coded and tested and re=submit.

Thanks.
-suresh

On 11/15/2015 11:53 PM, Michael Ellerman wrote:
> Hi Suresh,
> 
> On Thu, 2015-29-10 at 23:40:45 UTC, "Suresh E. Warrier" wrote:
>> This function supports IPI message passing for real
>> mode callers.
>>
>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>> index a53a130..8c07bfad 100644
>> --- a/arch/powerpc/kernel/smp.c
>> +++ b/arch/powerpc/kernel/smp.c
>> @@ -235,6 +238,33 @@ void smp_muxed_ipi_message_pass(int cpu, int msg)
>>  smp_ops->cause_ipi(cpu, info->data);
>>  }
>>  
>> +#if defined(CONFIG_KVM_XICS) && defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE)
>> +/*
>> + * Message passing code for real mode callers. It does not use the
>> + * smp_ops->cause_ipi function to cause an IPI, because those functions
>> + * access the MFFR through an ioremapped address.
>> + */
>> +void smp_muxed_ipi_rm_message_pass(int cpu, int msg)
>> +{
>> +struct cpu_messages *info = _cpu(ipi_message, cpu);
>> +char *message = (char *)>messages;
>> +unsigned long xics_phys;
>> +
>> +/*
>> + * Order previous accesses before accesses in the IPI handler.
>> + */
>> +smp_mb();
>> +message[msg] = 1;
>> +
>> +/*
>> + * cause_ipi functions are required to include a full barrier
>> + * before doing whatever causes the IPI.
>> + */
>> +xics_phys = paca[cpu].kvm_hstate.xics_phys;
>> +out_rm8((u8 *)(xics_phys + XICS_MFRR), IPI_PRIORITY);
>> +}
>> +#endif
> 
> 
> I'm not all that happy with this. This function does two things, one of which
> belongs in this file (setting message), and the other which definitely does
> not (the XICs part).
> 
> I think the end result would be cleaner if we did something like:
> 
> void smp_muxed_ipi_set_message(int cpu, int msg)
> {
>   struct cpu_messages *info = _cpu(ipi_message, cpu);
>   char *message = (char *)>messages;
>   unsigned long xics_phys;
> 
>   /*
>* Order previous accesses before accesses in the IPI handler.
>*/
>   smp_mb();
>   message[msg] = 1;
> }
> 
> Which would be exported, and could also be used by 
> smp_muxed_ipi_message_pass().
> 
> Then in icp_rm_set_vcpu_irq(), you would do something like:
> 
>   if (hcore != -1) {
>   hcpu = hcore << threads_shift;
>   kvmppc_host_rm_ops_hv->rm_core[hcore].rm_data = vcpu;
>   smp_muxed_ipi_set_message(hcpu, PPC_MSG_RM_HOST_ACTION);
>   icp_native_cause_ipi_real_mode();
>   }
> 
> Where icp_native_cause_ipi_real_mode() is a new hook you define in 
> icp_native.c
> which does the real mode write to MFRR.
> 
> cheers
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: Export __spin_yield

2015-02-25 Thread Suresh E. Warrier
Export __spin_yield so that the arch_spin_unlock() function can
be invoked from a module. This will be required for modules where
we want to take a lock that is also is acquired in hypervisor
real mode. Because we want to avoid running any lockdep code
(which may not be safe in real mode), this lock needs to be 
an arch_spinlock_t instead of a normal spinlock.

Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com
---
Replaced export to EXPORT_SYMBOL_GPL
Updated commit log to explain what kind of modules will need
to use the arch_spin_unlock() function

 arch/powerpc/lib/locks.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
index bb7cfec..f31bcee 100644
--- a/arch/powerpc/lib/locks.c
+++ b/arch/powerpc/lib/locks.c
@@ -41,6 +41,7 @@ void __spin_yield(arch_spinlock_t *lock)
plpar_hcall_norets(H_CONFER,
get_hard_smp_processor_id(holder_cpu), yield_count);
 }
+EXPORT_SYMBOL_GPL(__spin_yield);
 
 /*
  * Waiting for a read lock or a write lock on a rwlock...
-- 
1.8.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: Export __spin_yield

2015-02-24 Thread Suresh E. Warrier
On 02/23/2015 09:38 PM, Benjamin Herrenschmidt wrote:
 On Mon, 2015-02-23 at 18:10 -0600, Suresh E. Warrier wrote:
 Export __spin_yield so that the arch_spin_unlock() function
 can be invoked from a module.
 
 Make it EXPORT_SYMBOL_GPL. Also explain why a module might need it
 

Sure, I will change that to EXPORT_SYMBOL_GPL. Just curious, though, 
there is another symbol arch_spin_unlock_wait that is exported from
the file without the _GPL prefix. Any idea why?

I have mentioned that this needs to be exported to call the 
arch_spin_unlock() function from a module. What additional information
do you think will be useful here ? Are you looking at something
that explains why a module might need to call arch_spin_unlock()?

Thanks.
-suresh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: Export __spin_yield

2015-02-23 Thread Suresh E. Warrier
Export __spin_yield so that the arch_spin_unlock() function
can be invoked from a module.

Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com
---
 arch/powerpc/lib/locks.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
index bb7cfec..d100de8 100644
--- a/arch/powerpc/lib/locks.c
+++ b/arch/powerpc/lib/locks.c
@@ -41,6 +41,7 @@ void __spin_yield(arch_spinlock_t *lock)
plpar_hcall_norets(H_CONFER,
get_hard_smp_processor_id(holder_cpu), yield_count);
 }
+EXPORT_SYMBOL(__spin_yield);

 /*
  * Waiting for a read lock or a write lock on a rwlock...
-- 
1.8.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev