On 05/16/2017 06:10 PM, Greg Kurz wrote: > On Tue, 16 May 2017 17:18:27 +0200 > Cédric Le Goater <c...@kaod.org> wrote: > >> On 05/16/2017 02:55 PM, Laurent Vivier wrote: >>> On 16/05/2017 14:50, Cédric Le Goater wrote: >>>> On 05/16/2017 02:03 PM, Laurent Vivier wrote: >>>>> On 26/04/2017 09:00, David Gibson wrote: >>>>>> From: Cédric Le Goater <c...@kaod.org> >>>>>> >>>>>> Today, all the ICPs are created before the CPUs, stored in an array >>>>>> under the sPAPR machine and linked to the CPU when the core threads >>>>>> are realized. This modeling brings some complexity when a lookup in >>>>>> the array is required and it can be simplified by allocating the ICPs >>>>>> when the CPUs are. >>>>>> >>>>>> This is the purpose of this proposal which introduces a new 'icp_type' >>>>>> field under the machine and creates the ICP objects of the right type >>>>>> (KVM or not) before the PowerPCCPU object are. >>>>>> >>>>>> This change allows more cleanups : the removal of the icps array under >>>>>> the sPAPR machine and the removal of the xics_get_cpu_index_by_dt_id() >>>>>> helper. >>>>>> >>>>>> Signed-off-by: Cédric Le Goater <c...@kaod.org> >>>>>> Reviewed-by: David Gibson <da...@gibson.dropbear.id.au> >>>>>> Signed-off-by: David Gibson <da...@gibson.dropbear.id.au> >>>>>> --- >>>>>> hw/intc/xics.c | 11 ----------- >>>>>> hw/ppc/spapr.c | 47 >>>>>> ++++++++++++++--------------------------------- >>>>>> hw/ppc/spapr_cpu_core.c | 18 ++++++++++++++---- >>>>>> include/hw/ppc/spapr.h | 2 +- >>>>>> include/hw/ppc/xics.h | 2 -- >>>>>> 5 files changed, 29 insertions(+), 51 deletions(-) >>>>>> >>>>> >>>>> This commit breaks CPU re-hotplugging with KVM >>>>> >>>>> the sequence "device_add, device_del, device_add" brings to the >>>>> following error message: >>>>> >>>>> Unable to connect CPUx to kernel XICS: Device or resource busy >>>>> >>>>> It comes from icp_kvm_cpu_setup(): >>>>> >>>>> ... >>>>> ret = kvm_vcpu_enable_cap(cs, KVM_CAP_IRQ_XICS, 0, kernel_xics_fd, >>>>> kvm_arch_vcpu_id(cs)); >>>>> if (ret < 0) { >>>>> error_report("Unable to connect CPU%ld to kernel XICS: %s", >>>>> kvm_arch_vcpu_id(cs), strerror(errno)); >>>>> exit(1); >>>>> } >>>>> .. >>>>> >>>>> It should be protected by cap_irq_xics_enabled: >>>>> >>>>> ... >>>>> /* >>>>> * If we are reusing a parked vCPU fd corresponding to the CPU >>>>> * which was hot-removed earlier we don't have to renable >>>>> * KVM_CAP_IRQ_XICS capability again. >>>>> */ >>>>> if (icp->cap_irq_xics_enabled) { >>>>> return; >>>>> } >>>>> >>>>> ... >>>>> ret = kvm_vcpu_enable_cap(...); >>>>> ... >>>>> icp->cap_irq_xics_enabled = true; >>>>> ... >>>>> >>>>> But since this commit, "icp" is a new object on each call: >>>>> >>>>> spapr_cpu_core_realize_child() >>>>> ... >>>>> obj = object_new(spapr->icp_type); >>>>> ... >>>>> xics_cpu_setup(XICS_FABRIC(spapr), cpu, ICP(obj)); >>>>> ... >>>>> icpc->cpu_setup(icp, cpu); -> icp_kvm_cpu_setup() >>>>> ... >>>>> ... >>>>> >>>>> and "cap_irq_xics_enabled" is reinitialized. >>>>> >>>>> Any idea how to fix that? >>>> >>>> it seems that a cleanup is not done in the kernel. We are missing >>>> a way to call kvmppc_xics_free_icp() from QEMU. Today the only >>>> way is to destroy the vcpu. >>> >>> The commit introducing this hack, for reference: >>> >>> commit a45863bda90daa8ec39e5a312b9734fd4665b016 >>> Author: Bharata B Rao <bhar...@linux.vnet.ibm.com> >>> Date: Thu Jul 2 16:23:20 2015 +1000 >>> >>> xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled >>> >>> When supporting CPU hot removal by parking the vCPU fd and reusing >>> it during hotplug again, there can be cases where we try to reenable >>> KVM_CAP_IRQ_XICS CAP for the vCPU for which it was already enabled. >>> Introduce a boolean member in ICPState to track this and don't >>> reenable the CAP if it was already enabled earlier. >>> >>> Re-enabling this CAP should ideally work, but currently it results in >>> kernel trying to create and associate ICP with this vCPU and that >>> fails since there is already an ICP associated with it. Hence this >>> patch is needed to work around this problem in the kernel. >>> >>> This change allows CPU hot removal to work for sPAPR. >>> >>> Signed-off-by: Bharata B Rao <bhar...@linux.vnet.ibm.com> >>> Reviewed-by: David Gibson <da...@gibson.dropbear.id.au> >>> Signed-off-by: David Gibson <da...@gibson.dropbear.id.au> >>> Signed-off-by: Alexander Graf <ag...@suse.de> >> >> OK. >> >> Greg is looking at re-adding the ICPState array because of a >> migration issue with older machines. We might need to do so >> unconditionally ... >> > > That would be a pity to carry on with the pre-allocated ICPStates for > new machine types just because of that... What about keeping track > of all the cap_irq_xics_enabled flags in a separate max_cpus sized > static array ?
Could we use 'cpu->unplug' instead ? C. >> But for that specific issue, I think it would have been better >> to clean up the kernel state. Is that possible ? >> > > Commit 4c055ab54fae ("cpu: Reclaim vCPU objects") gives some more details > on why we don't destroy the vCPU in KVM on unplug, but rather park the vCPU > fd for later use... so I'm not sure we can clean up the kernel state. > > But since the vCPU is still present, maybe we can find a way to tell KVM > that we want to reuse an already present ICP ? > >> Thanks, >> >> C. >> >> >>>> Else we need to reintroduce the array of icps (again) to keep some >>>> xics state ... but that just sucks :/ Let me think about it. >>>> >>> >>> Thanks, >>> Laurent >>>> C. >>>> >>> >> >