On 11/23/20 12:16 PM, Greg Kurz wrote:
> On Mon, 23 Nov 2020 10:46:38 +0100
> Cédric Le Goater <c...@kaod.org> wrote:
> 
>> On 11/20/20 6:46 PM, Greg Kurz wrote:
>>> We're going to kill the "nr_ends" field in a subsequent patch.
>>
>> why ? it is one of the tables of the controller and its part of 
>> the main XIVE concepts. Conceptually, we could let the machine 
>> dimension it with an arbitrary value as OPAL does. The controller
>> would fail when the table is fully used. 
>>
> 
> The idea is that the sPAPR machine only true need is to create a
> controller that can accommodate up to a certain number of vCPU ids.
> It doesn't really to know about the END itself IMHO.> 
> This being said, if we decide to pass both spapr_max_server_number()
> and smp.max_cpus down to the backends as function arguments, we won't
> have to change "nr_ends" at all.

I would prefer that but I am still not sure what they represent. 

Looking at the sPAPR XIVE code, we deal with numbers/ranges in the 
following places today.

 * spapr_xive_dt() 

   It defines a range of interrupt numbers to be used by the guest 
   for the threads/vCPUs IPIs. It's a subset of interrupt numbers 
   in :

                [ SPAPR_IRQ_IPI - SPAPR_IRQ_EPOW [

   These are not vCPU ids.

   Since these interrupt numbers will be considered as free to use
   by the OS, it makes sense to pre-claim them. But claiming an 
   interrupt number in the guest can potentially set up, through 
   the KVM device, a mapping on the host and in HW. See below why
   this can be a problem.

 * kvmppc_xive_cpu_connect()   

   This sizes the NVT tables in OPAL for the guest. This is the  
   max number of vCPUs of the guest (not vCPU ids)

 * spapr_irq_init()

   This is where the IPI interrupt numbers are claimed today. 
   Directly in QEMU and KVM, if the machine is running XIVE only, 
   indirectly if it's dual, first in QEMU and then in KVM when 
   the machine switches of interrupt mode in CAS. 

   The problem is that the underlying XIVE resources in HW are 
   allocated where the QEMU process is running. Which is not the
   best option when the vCPUs are pinned on different chips.

   My patchset was trying to improve that by claiming the IPI on 
   demand when the vCPU is connected to the KVM device. But it 
   was using the vCPU id as the IPI interrupt number which is 
   utterly wrong, the guest OS could use any number in the range 
   exposed in the DT.
   
   The last patch you sent was going in the right direction I think.
   That is to claim the IPI when the guest OS is requesting for it. 

   
http://patchwork.ozlabs.org/project/qemu-devel/patch/160528045027.804522.6161091782230763832.st...@bahia.lan/
   
   But I don't understand why it was so complex. It should be like
   the MSIs claimed by PCI devices.


All this to say, that we need to size better the range in the 
"ibm,xive-lisn-ranges" property if that's broken for vSMT. 

Then, I think the IPIs can be treated just like the PCI MSIs
but they need to be claimed first. That's the ugly part. 

Should we add a special check in h_int_set_source_config to
deal with unclaimed IPIs that are being configured ?


C.

Reply via email to