> -----Original Message----- > From: Igor Mammedov [mailto:imamm...@redhat.com] > Sent: Thursday, January 09, 2020 5:54 PM > To: Zengtao (B) > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; qemu-triv...@nongnu.org; > Shannon Zhao; Peter Maydell; qemu-...@nongnu.org > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors structure by > node_id ascending order > > On Thu, 9 Jan 2020 02:45:58 +0000 > "Zengtao (B)" <prime.z...@hisilicon.com> wrote: > > > > -----Original Message----- > > > From: Igor Mammedov [mailto:imamm...@redhat.com] > > > Sent: Thursday, January 09, 2020 12:39 AM > > > To: Zengtao (B) > > > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; > qemu-triv...@nongnu.org; > > > Shannon Zhao; Peter Maydell; qemu-...@nongnu.org > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors structure > by > > > node_id ascending order > > > > > > On Wed, 8 Jan 2020 04:02:10 +0000 > > > "Zengtao (B)" <prime.z...@hisilicon.com> wrote: > > > > > > > > -----Original Message----- > > > > > From: Igor Mammedov [mailto:imamm...@redhat.com] > > > > > Sent: Tuesday, January 07, 2020 11:50 PM > > > > > To: Zengtao (B) > > > > > Cc: Michael S. Tsirkin; qemu-devel@nongnu.org; > > > qemu-triv...@nongnu.org; > > > > > Shannon Zhao; Peter Maydell; qemu-...@nongnu.org > > > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors > structure > > > by > > > > > node_id ascending order > > > > > > > > > > On Tue, 7 Jan 2020 10:29:22 +0000 > > > > > "Zengtao (B)" <prime.z...@hisilicon.com> wrote: > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Michael S. Tsirkin [mailto:m...@redhat.com] > > > > > > > Sent: Tuesday, January 07, 2020 5:33 PM > > > > > > > To: Zengtao (B) > > > > > > > Cc: qemu-devel@nongnu.org; qemu-triv...@nongnu.org; > Shannon > > > > > Zhao; > > > > > > > Peter Maydell; Igor Mammedov; qemu-...@nongnu.org > > > > > > > Subject: Re: [PATCH] hw/arm/acpi: Pack the SRAT processors > > > structure > > > > > by > > > > > > > node_id ascending order > > > > > > > > > > > > > > On Tue, Jan 07, 2020 at 05:18:49PM +0800, Zeng Tao wrote: > > > > > > > > When booting the guest linux with the following numa > > > configuration: > > > > > > > > -numa node,node_id=1,cpus=0-3 > > > > > > > > -numa node,node_id=0,cpus=4-7 > > > > > > > > We can get the following numa topology in the guest system: > > > > > > > > Architecture: aarch64 > > > > > > > > Byte Order: Little Endian > > > > > > > > CPU(s): 8 > > > > > > > > On-line CPU(s) list: 0-7 > > > > > > > > Thread(s) per core: 1 > > > > > > > > Core(s) per socket: 8 > > > > > > > > Socket(s): 1 > > > > > > > > NUMA node(s): 2 > > > > > > > > L1d cache: unknown size > > > > > > > > L1i cache: unknown size > > > > > > > > L2 cache: unknown size > > > > > > > > NUMA node0 CPU(s): 0-3 > > > > > > > > NUMA node1 CPU(s): 4-7 > > > > > > > > The Cpus 0-3 is assigned with NUMA node 1 in QEMU while it > get > > > > > NUMA > > > > > > > node > > > > > > > > 0 in the guest. > > > > > > > > > > > > > > > > In fact, In the linux kernel, numa_node_id is allocated per the > > > ACPI > > > > > > > > SRAT processors structure order,so the cpu 0 will be the first > one > > > to > > > > > > > > allocate its NUMA node id, so it gets the NUMA node 0. > > > > > > > > > > > > > > > > To fix this issue, we pack the SRAT processors structure in > numa > > > node > > > > > id > > > > > > > > order but not the default cpu number order. > > > > > > > > > > > > > > > > Signed-off-by: Zeng Tao <prime.z...@hisilicon.com> > > > > > > > > > > > > > > > > > > > > > Does this matter? If yes fixing linux to take node id from > proximity > > > > > > > field in ACPI seems cleaner ... > > > > > > > > > > > > > > > > > > > In fact, I just want to align the node_id concept in QEMU and > Linux. > > > > > > If we fix the kernel side, we need to align with all platforms. > > > > > > i think maybe not a good idea. > > > > > if linux makes up node ID's on it's own, it would be hard for it to > > > > > map SRAT entries to other tables that use proximity id as well. > > > > > > > > > > So it would need to maintain map of [proximity id] -> [node id] (and > > > reverse) > > > > > somewhere to resolve mappings on other tables. > > > > > If it doesn't do this then it's broken and works just by accident, > > > > > if it does the fix probably should be in that code and not in QEMU. > > > > > > > > > > > > > Hmm, the problem is how to understand the concept of node id. > > > > 1. In dts, there is node id. Both the QEMU and Linux can use it > > > > directly, so no conflict. > > > > 2. In ACPI, there is only proximity domain, but no node id, there > > > > should be a mapping between them, while kernel and QEMU > maintain > > > > their own rule, and unfortunately they conflict with each other > > > > sometimes. > > > > > > > > There is no specification to indicate what we should do to maintain > the > > > > mapping, so it's hard to align the QEMU and Linux. > > > > > > > > Any suggestion, or we just accept it as a rule since it don't affect > much? > > > > > > If node id generation is driven by SRAT content, it might be reasonable > to > > > ask for SRAT parser in kernel to create node ids using proximity value > > > instead of the order ACPI_SRAT_PROCESSOR_GICC structures are > > > enumerated. > > > That way node id would match ACPI spec. > > > > > > > I don't quite understand "That way node id would match ACPI spec." > > I check the ACPI 6.3 spec, I didn't find any description that node id should > > be equal to proximity value, in section 6.2.15, there is indeed an example > > which node numbers equals to proximity value. > > There is only proximity id in spec (which conceptually is the same as node > id). > What I'm saying is that linux kernel ACPI parser could use it as node id > instead of using counter for making up node ids. > > (issue with it is that proximity values could be sparse, so one > would need to audit current node id users to make sure that using > proximity id won't break anything) > I think that is really a problem, in linux kernel: #if MAX_NUMNODEs > 256 #define MAX_PXM_DOMAINS MAX_NUMNODEs #else #define MAX_PXM_DOMAINS 256 #endif It seems that using counter is a better choice.
> > Thanks > > > > > But even with that I'd wouldn't expect cpu ids match as its basically > > > arbitrary numbers on both sided. One would need to use arch specific > ids > > > to reliably match cpus on both sides (MPIDR in ARM case or APICID in > x86). > > >. > > > > > > > > > > > > > > --- > > > > > > > > hw/arm/virt-acpi-build.c | 23 +++++++++++++++-------- > > > > > > > > 1 file changed, 15 insertions(+), 8 deletions(-) > > > > > > > > > > > > > > > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c > > > > > > > > index bd5f771..497192b 100644 > > > > > > > > --- a/hw/arm/virt-acpi-build.c > > > > > > > > +++ b/hw/arm/virt-acpi-build.c > > > > > > > > @@ -520,7 +520,8 @@ build_srat(GArray *table_data, > > > BIOSLinker > > > > > > > *linker, VirtMachineState *vms) > > > > > > > > AcpiSystemResourceAffinityTable *srat; > > > > > > > > AcpiSratProcessorGiccAffinity *core; > > > > > > > > AcpiSratMemoryAffinity *numamem; > > > > > > > > - int i, srat_start; > > > > > > > > + int i, j, srat_start; > > > > > > > > + uint32_t node_id; > > > > > > > > uint64_t mem_base; > > > > > > > > MachineClass *mc = MACHINE_GET_CLASS(vms); > > > > > > > > MachineState *ms = MACHINE(vms); > > > > > > > > @@ -530,13 +531,19 @@ build_srat(GArray *table_data, > > > BIOSLinker > > > > > > > *linker, VirtMachineState *vms) > > > > > > > > srat = acpi_data_push(table_data, sizeof(*srat)); > > > > > > > > srat->reserved1 = cpu_to_le32(1); > > > > > > > > > > > > > > > > - for (i = 0; i < cpu_list->len; ++i) { > > > > > > > > - core = acpi_data_push(table_data, sizeof(*core)); > > > > > > > > - core->type = ACPI_SRAT_PROCESSOR_GICC; > > > > > > > > - core->length = sizeof(*core); > > > > > > > > - core->proximity = > > > > > > > cpu_to_le32(cpu_list->cpus[i].props.node_id); > > > > > > > > - core->acpi_processor_uid = cpu_to_le32(i); > > > > > > > > - core->flags = cpu_to_le32(1); > > > > > > > > + for (i = 0; i < ms->numa_state->num_nodes; ++i) { > > > > > > > > + for (j = 0; j < cpu_list->len; ++j) { > > > > > > > > > > > > > > Hmm O(n ^2) isn't great ... > > > > > > Good suggestion, 3Q. > > > > > > > > > > > > > > > + node_id = > > > > > cpu_to_le32(cpu_list->cpus[j].props.node_id); > > > > > > > > + if (node_id != i) { > > > > > > > > + continue; > > > > > > > > + } > > > > > > > > + core = acpi_data_push(table_data, > sizeof(*core)); > > > > > > > > + core->type = ACPI_SRAT_PROCESSOR_GICC; > > > > > > > > + core->length = sizeof(*core); > > > > > > > > + core->proximity = node_id; > > > > > > > > + core->acpi_processor_uid = cpu_to_le32(j); > > > > > > > > + core->flags = cpu_to_le32(1); > > > > > > > > + } > > > > > > > > } > > > > > > > > > > > > > > is the issue arm specific? wouldn't it affect x86 too? > > > > > > > > > > > > > Good question, I think it will affect x86, but I need to confirm. > > > > > > > > > > > > > > mem_base = vms->memmap[VIRT_MEM].base; > > > > > > > > -- > > > > > > > > 2.8.1 > > > > > > > > > > > >