Re: [PATCH] powerpc/pseries: Use doorbells even if XIVE is available

2020-06-27 Thread Nicholas Piggin
Excerpts from Cédric Le Goater's message of June 26, 2020 5:17 pm:
> Adding David, 
> 
> On 6/25/20 3:11 AM, Michael Ellerman wrote:
>> Nicholas Piggin  writes:
>>> KVM supports msgsndp in guests by trapping and emulating the
>>> instruction, so it was decided to always use XIVE for IPIs if it is
>>> available. However on PowerVM systems, msgsndp can be used and gives
>>> better performance. On large systems, high XIVE interrupt rates can
>>> have sub-linear scaling, and using msgsndp can reduce the load on
>>> the interrupt controller.
>>>
>>> So switch to using core local doorbells even if XIVE is available.
>>> This reduces performance for KVM guests with an SMT topology by
>>> about 50% for ping-pong context switching between SMT vCPUs.
>> 
>> You have to take explicit steps to configure KVM in that way with qemu.
>> eg. "qemu .. -smp 8" will give you 8 SMT1 CPUs by default.
>> 
>>> An option vector (or dt-cpu-ftrs) could be defined to disable msgsndp
>>> to get KVM performance back.
> 
> An option vector would require a PAPR change. Unless the architecture 
> reserves some bits for the implementation, but I don't think so. Same
> for CAS.
> 
>> Qemu/KVM populates /proc/device-tree/hypervisor, so we *could* look at
>> that. Though adding PowerVM/KVM specific hacks is obviously a very
>> slippery slope.
> 
> QEMU could advertise a property "emulated-msgsndp", or something similar, 
> which would be interpreted by Linux as a CPU feature and taken into account 
> when doing the IPIs.

What I'm going to do is detect KVM here (we already have a KVM detection
test using that dt property). The IPI setup code already has KVM hacks 
in it, so I don't really see the problem with puting them behind a KVM
test.

I think doing cpu ftrs or some specific entry for msgsndp in particular
is the right way to go, but in the interests of making existing KVM work
I'll do this.

Thanks,
Nick


Re: [PATCH] powerpc/pseries: Use doorbells even if XIVE is available

2020-06-26 Thread Cédric Le Goater
Adding David, 

On 6/25/20 3:11 AM, Michael Ellerman wrote:
> Nicholas Piggin  writes:
>> KVM supports msgsndp in guests by trapping and emulating the
>> instruction, so it was decided to always use XIVE for IPIs if it is
>> available. However on PowerVM systems, msgsndp can be used and gives
>> better performance. On large systems, high XIVE interrupt rates can
>> have sub-linear scaling, and using msgsndp can reduce the load on
>> the interrupt controller.
>>
>> So switch to using core local doorbells even if XIVE is available.
>> This reduces performance for KVM guests with an SMT topology by
>> about 50% for ping-pong context switching between SMT vCPUs.
> 
> You have to take explicit steps to configure KVM in that way with qemu.
> eg. "qemu .. -smp 8" will give you 8 SMT1 CPUs by default.
> 
>> An option vector (or dt-cpu-ftrs) could be defined to disable msgsndp
>> to get KVM performance back.

An option vector would require a PAPR change. Unless the architecture 
reserves some bits for the implementation, but I don't think so. Same
for CAS.

> Qemu/KVM populates /proc/device-tree/hypervisor, so we *could* look at
> that. Though adding PowerVM/KVM specific hacks is obviously a very
> slippery slope.

QEMU could advertise a property "emulated-msgsndp", or something similar, 
which would be interpreted by Linux as a CPU feature and taken into account 
when doing the IPIs.

The CPU setup for XIVE needs a cleanup also. There is no need to allocate
interrupts for IPIs anymore in that case.

Thanks,

C. 


> 
>> diff --git a/arch/powerpc/platforms/pseries/smp.c 
>> b/arch/powerpc/platforms/pseries/smp.c
>> index 6891710833be..a737a2f87c67 100644
>> --- a/arch/powerpc/platforms/pseries/smp.c
>> +++ b/arch/powerpc/platforms/pseries/smp.c
>> @@ -188,13 +188,14 @@ static int pseries_smp_prepare_cpu(int cpu)
>>  return 0;
>>  }
>>  
>> +static void  (*cause_ipi_offcore)(int cpu) __ro_after_init;
>> +
>>  static void smp_pseries_cause_ipi(int cpu)
> 
> This is static so the name could be more descriptive, it doesn't need
> the "smp_pseries" prefix.
> 
>>  {
>> -/* POWER9 should not use this handler */
>>  if (doorbell_try_core_ipi(cpu))
>>  return;
> 
> Seems like it would be worth making that static inline so we can avoid
> the function call overhead.
> 
>> -icp_ops->cause_ipi(cpu);
>> +cause_ipi_offcore(cpu);
>>  }
>>  
>>  static int pseries_cause_nmi_ipi(int cpu)
>> @@ -222,10 +223,7 @@ static __init void pSeries_smp_probe_xics(void)
>>  {
>>  xics_smp_probe();
>>  
>> -if (cpu_has_feature(CPU_FTR_DBELL) && !is_secure_guest())
>> -smp_ops->cause_ipi = smp_pseries_cause_ipi;
>> -else
>> -smp_ops->cause_ipi = icp_ops->cause_ipi;
>> +smp_ops->cause_ipi = icp_ops->cause_ipi;
>>  }
>>  
>>  static __init void pSeries_smp_probe(void)
>> @@ -238,6 +236,18 @@ static __init void pSeries_smp_probe(void)
> 
> The comment just above here says:
> 
>   /*
>* Don't use P9 doorbells when XIVE is enabled. IPIs
>* using MMIOs should be faster
>*/
>>  xive_smp_probe();
> 
> Which is no longer true.
> 
>>  else
>>  pSeries_smp_probe_xics();
> 
> I think you should just fold this in, it would make the logic slightly
> easier to follow.
> 
>> +/*
>> + * KVM emulates doorbells by reading the instruction, which
>> + * can't be done if the guest is secure. If a secure guest
>> + * runs under PowerVM, it could use msgsndp but would need a
>> + * way to distinguish.
>> + */
> 
> It's not clear what it needs to distinguish: That it's running under
> PowerVM and therefore *can* use msgsndp even though it's secure.
> 
> Also the comment just talks about the is_secure_guest() test, which is
> not obvious on first reading.
> 
>> +if (cpu_has_feature(CPU_FTR_DBELL) &&
>> +cpu_has_feature(CPU_FTR_SMT) && !is_secure_guest()) {
>> +cause_ipi_offcore = smp_ops->cause_ipi;
>> +smp_ops->cause_ipi = smp_pseries_cause_ipi;
>> +}
> 
> Because we're at the tail of the function I think this would be clearer
> if it used early returns, eg:
> 
>   // If the CPU doesn't have doorbells then we must use xics/xive
>   if (!cpu_has_feature(CPU_FTR_DBELL))
>   return;
> 
>   // If the CPU doesn't have SMT then doorbells don't help us
>   if (!cpu_has_feature(CPU_FTR_SMT))
>   return;
> 
>   // Secure guests can't use doorbells because ...
>   if (!is_secure_guest()
>   return;
> 
>   /*
>  * Otherwise we want to use doorbells for sibling threads and
>  * xics/xive for IPIs off the core, because it performs better
>  * on large systems ...
>  */
> cause_ipi_offcore = smp_ops->cause_ipi;
>   smp_ops->cause_ipi = smp_pseries_cause_ipi;
> }
> 
> 
> cheers
> 



Re: [PATCH] powerpc/pseries: Use doorbells even if XIVE is available

2020-06-26 Thread Cédric Le Goater
>>> An option vector (or dt-cpu-ftrs) could be defined to disable msgsndp
>>> to get KVM performance back.
> 
> An option vector would require a PAPR change. Unless the architecture 
> reserves some bits for the implementation, but I don't think so. Same
> for CAS.
> 
>> Qemu/KVM populates /proc/device-tree/hypervisor, so we *could* look at
>> that. Though adding PowerVM/KVM specific hacks is obviously a very
>> slippery slope.
> 
> QEMU could advertise a property "emulated-msgsndp", or something similar, 
> which would be interpreted by Linux as a CPU feature and taken into account 
> when doing the IPIs.

Could we remove msgsndp support from HFSCR in KVM and test it in pseries ? 

C.


Re: [PATCH] powerpc/pseries: Use doorbells even if XIVE is available

2020-06-26 Thread Cédric Le Goater
[ ...  ]

>>> An option vector (or dt-cpu-ftrs) could be defined to disable msgsndp
>>> to get KVM performance back.
> 
> An option vector would require a PAPR change. Unless the architecture 
> reserves some bits for the implementation, but I don't think so. Same
> for CAS.
> 
>> Qemu/KVM populates /proc/device-tree/hypervisor, so we *could* look at
>> that. Though adding PowerVM/KVM specific hacks is obviously a very
>> slippery slope.
> 
> QEMU could advertise a property "emulated-msgsndp", or something similar, 
> which would be interpreted by Linux as a CPU feature and taken into account 
> when doing the IPIs.

This is really a PowerVM optimization. 

> The CPU setup for XIVE needs a cleanup also. There is no need to allocate
> interrupts for IPIs anymore in that case.

We need to keep these for the other cores. The XIVE layer is unchanged.
 
C.