On 11/23/20 12:16 PM, Greg Kurz wrote: > On Mon, 23 Nov 2020 10:46:38 +0100 > Cédric Le Goater <c...@kaod.org> wrote: > >> On 11/20/20 6:46 PM, Greg Kurz wrote: >>> We're going to kill the "nr_ends" field in a subsequent patch. >> >> why ? it is one of the tables of the controller and its part of >> the main XIVE concepts. Conceptually, we could let the machine >> dimension it with an arbitrary value as OPAL does. The controller >> would fail when the table is fully used. >> > > The idea is that the sPAPR machine only true need is to create a > controller that can accommodate up to a certain number of vCPU ids. > It doesn't really to know about the END itself IMHO.> > This being said, if we decide to pass both spapr_max_server_number() > and smp.max_cpus down to the backends as function arguments, we won't > have to change "nr_ends" at all.
I would prefer that but I am still not sure what they represent. Looking at the sPAPR XIVE code, we deal with numbers/ranges in the following places today. * spapr_xive_dt() It defines a range of interrupt numbers to be used by the guest for the threads/vCPUs IPIs. It's a subset of interrupt numbers in : [ SPAPR_IRQ_IPI - SPAPR_IRQ_EPOW [ These are not vCPU ids. Since these interrupt numbers will be considered as free to use by the OS, it makes sense to pre-claim them. But claiming an interrupt number in the guest can potentially set up, through the KVM device, a mapping on the host and in HW. See below why this can be a problem. * kvmppc_xive_cpu_connect() This sizes the NVT tables in OPAL for the guest. This is the max number of vCPUs of the guest (not vCPU ids) * spapr_irq_init() This is where the IPI interrupt numbers are claimed today. Directly in QEMU and KVM, if the machine is running XIVE only, indirectly if it's dual, first in QEMU and then in KVM when the machine switches of interrupt mode in CAS. The problem is that the underlying XIVE resources in HW are allocated where the QEMU process is running. Which is not the best option when the vCPUs are pinned on different chips. My patchset was trying to improve that by claiming the IPI on demand when the vCPU is connected to the KVM device. But it was using the vCPU id as the IPI interrupt number which is utterly wrong, the guest OS could use any number in the range exposed in the DT. The last patch you sent was going in the right direction I think. That is to claim the IPI when the guest OS is requesting for it. http://patchwork.ozlabs.org/project/qemu-devel/patch/160528045027.804522.6161091782230763832.st...@bahia.lan/ But I don't understand why it was so complex. It should be like the MSIs claimed by PCI devices. All this to say, that we need to size better the range in the "ibm,xive-lisn-ranges" property if that's broken for vSMT. Then, I think the IPIs can be treated just like the PCI MSIs but they need to be claimed first. That's the ugly part. Should we add a special check in h_int_set_source_config to deal with unclaimed IPIs that are being configured ? C.