On 3/9/21 6:08 PM, Daniel Henrique Barboza wrote:
> 
> 
> On 3/9/21 12:33 PM, Cédric Le Goater wrote:
>> On 3/8/21 6:13 PM, Greg Kurz wrote:
>>> On Wed, 3 Mar 2021 18:48:50 +0100
>>> Cédric Le Goater <c...@kaod.org> wrote:
>>>
>>>> The 'chip_id' field of the XIVE CPU structure is used to choose a
>>>> target for a source located on the same chip when possible. This field
>>>> is assigned on the PowerNV platform using the "ibm,chip-id" property
>>>> on pSeries under KVM when NUMA nodes are defined but it is undefined
>>>
>>> This sentence seems to have a syntax problem... like it is missing an
>>> 'and' before 'on pSeries'.
>>
>> ah yes, or simply a comma.
>>
>>>> under PowerVM. The XIVE source structure has a similar field
>>>> 'src_chip' which is only assigned on the PowerNV platform.
>>>>
>>>> cpu_to_node() returns a compatible value on all platforms, 0 being the
>>>> default node. It will also give us the opportunity to set the affinity
>>>> of a source on pSeries when we can localize them.
>>>>
>>>
>>> IIUC this relies on the fact that the NUMA node id is == to chip id
>>> on PowerNV, i.e. xc->chip_id which is passed to OPAL remain stable
>>> with this change.
>>
>> Linux sets the NUMA node in numa_setup_cpu(). On pseries, the hcall
>> H_HOME_NODE_ASSOCIATIVITY returns the node id if I am correct (Daniel
>> in Cc:)
> 
> That's correct. H_HOME_NODE_ASSOCIATIVITY returns not only the node_id, but
> a list with the ibm,associativity domains of the CPU that "proc-no" (processor
> identifier) is mapped to inside QEMU.
> 
> node_id in this case, considering that we're working with a reference-points
> of size 4, is the 4th element of the returned list. The last element is
> "procno" itself.
> 
> 
>>
>> On PowerNV, Linux uses "ibm,associativity" property of the CPU to find
>> the node id. This value is built from the chip id in OPAL, so the
>> value returned by cpu_to_node(cpu) and the value of the "ibm,chip-id"
>> property are unlikely to be different.
>>
>> cpu_to_node(cpu) is used in many places to allocate the structures
>> locally to the owning node. XIVE is not an exception (see below in the
>> same patch), it is better to be consistent and get the same information
>> (node id) using the same routine.
>>
>>
>> In Linux, "ibm,chip-id" is only used in low level PowerNV drivers :
>> LPC, XSCOM, RNG, VAS, NX. XIVE should be in that list also but skiboot
>> unifies the controllers of the system to only expose one the OS. This
>> is problematic and should be changed but it's another topic.
>>
>>
>>> On the other hand, you have the pSeries case under PowerVM that
>>> doesn't xc->chip_id, which isn't passed to any hcall AFAICT.
>>
>> yes "ibm,chip-id" is an OPAL concept unfortunately and it has no meaning
>> under PAPR. xc->chip_id on pseries (PowerVM) will contains an invalid
>> chip id.
>>
>> QEMU/KVM exposes "ibm,chip-id" but it's not used. (its value is not
>> always correct btw)
> 
> 
> If you have a way to reliably reproduce this, let me know and I'll fix it
> up in QEMU.

with :

   -smp 4,cores=1,maxcpus=8 -object memory-backend-ram,id=ram-node0,size=2G 
-numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object 
memory-backend-ram,id=ram-node1,size=2G -numa 
node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1

# dmesg | grep numa
[    0.013106] numa: Node 0 CPUs: 0-1
[    0.013136] numa: Node 1 CPUs: 2-3

# dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id
                ibm,chip-id = <0x01>;
                ibm,chip-id = <0x02>;
                ibm,chip-id = <0x00>;
                ibm,chip-id = <0x03>;

with :

  -smp 4,cores=4,maxcpus=8,threads=1 -object 
memory-backend-ram,id=ram-node0,size=2G -numa 
node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object 
memory-backend-ram,id=ram-node1,size=2G -numa 
node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1

# dmesg | grep numa
[    0.013106] numa: Node 0 CPUs: 0-1
[    0.013136] numa: Node 1 CPUs: 2-3

# dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id
                ibm,chip-id = <0x00>;
                ibm,chip-id = <0x00>;
                ibm,chip-id = <0x00>;
                ibm,chip-id = <0x00>;

I think we should simply remove "ibm,chip-id" since it's not used and
not in the PAPR spec.

Thanks,

C.

 

> 
> Thanks,
> 
> 
> DHB
> 
> 
>>
>>> It looks like the chip id is only used for localization purpose in
>>> this case, right ?
>>
>> Yes and PAPR sources are not localized. So it's not used. MSI sources
>> could be if we rewrote the MSI driver.
>>
>>> In this case, what about doing this change for pSeries only,
>>> somewhere in spapr.c ?
>>
>> The IPI code is common to all platforms and all have the same issue.
>> I rather not.
>>
>> Thanks,
>>
>> C.
>>  
>>>> Signed-off-by: Cédric Le Goater <c...@kaod.org>
>>>> ---
>>>>   arch/powerpc/sysdev/xive/common.c | 7 +------
>>>>   1 file changed, 1 insertion(+), 6 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/sysdev/xive/common.c 
>>>> b/arch/powerpc/sysdev/xive/common.c
>>>> index 595310e056f4..b8e456da28aa 100644
>>>> --- a/arch/powerpc/sysdev/xive/common.c
>>>> +++ b/arch/powerpc/sysdev/xive/common.c
>>>> @@ -1335,16 +1335,11 @@ static int xive_prepare_cpu(unsigned int cpu)
>>>>         xc = per_cpu(xive_cpu, cpu);
>>>>       if (!xc) {
>>>> -        struct device_node *np;
>>>> -
>>>>           xc = kzalloc_node(sizeof(struct xive_cpu),
>>>>                     GFP_KERNEL, cpu_to_node(cpu));
>>>>           if (!xc)
>>>>               return -ENOMEM;
>>>> -        np = of_get_cpu_node(cpu, NULL);
>>>> -        if (np)
>>>> -            xc->chip_id = of_get_ibm_chip_id(np);
>>>> -        of_node_put(np);
>>>> +        xc->chip_id = cpu_to_node(cpu);
>>>>           xc->hw_ipi = XIVE_BAD_IRQ;
>>>>             per_cpu(xive_cpu, cpu) = xc;
>>>
>>

Reply via email to