Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID
Hi Igor, On 2/28/22 6:54 PM, Igor Mammedov wrote: On Mon, 28 Feb 2022 12:26:53 +0800 Gavin Shan wrote: On 2/25/22 6:03 PM, Igor Mammedov wrote: On Fri, 25 Feb 2022 16:41:43 +0800 Gavin Shan wrote: On 2/17/22 10:14 AM, Gavin Shan wrote: On 1/26/22 5:14 PM, Igor Mammedov wrote: On Wed, 26 Jan 2022 13:24:10 +0800 Gavin Shan wrote: The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id() when it isn't provided explicitly. However, the CPU topology isn't fully considered in the default association and it causes CPU topology broken warnings on booting Linux guest. For example, the following warning messages are observed when the Linux guest is booted with the following command lines. /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host \ -cpu host \ -smp 6,sockets=2,cores=3,threads=1 \ -m 1024M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=128M \ -object memory-backend-ram,id=mem1,size=128M \ -object memory-backend-ram,id=mem2,size=128M \ -object memory-backend-ram,id=mem3,size=128M \ -object memory-backend-ram,id=mem4,size=128M \ -object memory-backend-ram,id=mem4,size=384M \ -numa node,nodeid=0,memdev=mem0 \ -numa node,nodeid=1,memdev=mem1 \ -numa node,nodeid=2,memdev=mem2 \ -numa node,nodeid=3,memdev=mem3 \ -numa node,nodeid=4,memdev=mem4 \ -numa node,nodeid=5,memdev=mem5 : alternatives: patching kernel code BUG: arch topology borken the CLS domain not a subset of the MC domain BUG: arch topology borken the DIE domain not a subset of the NODE domain With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5 are associated with NODE#0 to NODE#5 separately. That's incorrect because CPU#0/1/2 should be associated with same NUMA node because they're seated in same socket. This fixes the issue by considering the socket when default CPU-to-NUMA is given. With this applied, no more CPU topology broken warnings are seen from the Linux guest. The 6 CPUs are associated with NODE#0/1, but there are no CPUs associated with NODE#2/3/4/5. From migration point of view it looks fine to me, and doesn't need a compat knob since NUMA data (on virt-arm) only used to construct ACPI tables (and we don't version those unless something is broken by it). Signed-off-by: Gavin Shan --- hw/arm/virt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 141350bf21..b4a95522d3 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2499,7 +2499,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) { - return idx % ms->numa_state->num_nodes; + return idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * ms->smp.threads); I'd like for ARM folks to confirm whether above is correct (i.e. socket is NUMA node boundary and also if above topo vars could have odd values. Don't look at horribly complicated x86 as example, but it showed that vendors could stash pretty much anything there, so we should consider it here as well and maybe forbid that in smp virt-arm parser) After doing some investigation, I don't think the socket is NUMA node boundary. Unfortunately, I didn't find it's documented like this in any documents after checking device-tree specification, Linux CPU topology and NUMA binding documents. However, there are two options here according to Linux (guest) kernel code: (A) socket is NUMA node boundary (B) CPU die is NUMA node boundary. They are equivalent as CPU die isn't supported on arm/virt machine. Besides, the topology of one-to-one association between socket and NUMA node sounds natural and simplified. So I think (A) is the best way to go. Another thing I want to explain here is how the changes affect the memory allocation in Linux guest. Taking the command lines included in the commit log as an example, the first two NUMA nodes are bound to CPUs while the other 4 NUMA nodes are regarded as remote NUMA nodes to CPUs. The remote NUMA node won't accommodate the memory allocation until the memory in the near (local) NUMA node becomes exhausted. However, it's uncertain how the memory is hosted if memory binding isn't applied. Besides, I think the code should be improved like below to avoid overflow on ms->numa_state->num_nodes. static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) { - return idx % ms->numa_state->num_nodes; + int node_idx; + + node_idx = idx / (ms-
Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID
On Mon, 28 Feb 2022 12:26:53 +0800 Gavin Shan wrote: > Hi Igor, > > On 2/25/22 6:03 PM, Igor Mammedov wrote: > > On Fri, 25 Feb 2022 16:41:43 +0800 > > Gavin Shan wrote: > >> On 2/17/22 10:14 AM, Gavin Shan wrote: > >>> On 1/26/22 5:14 PM, Igor Mammedov wrote: > On Wed, 26 Jan 2022 13:24:10 +0800 > Gavin Shan wrote: > > > The default CPU-to-NUMA association is given by > > mc->get_default_cpu_node_id() > > when it isn't provided explicitly. However, the CPU topology isn't fully > > considered in the default association and it causes CPU topology broken > > warnings on booting Linux guest. > > > > For example, the following warning messages are observed when the Linux > > guest > > is booted with the following command lines. > > > > /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ > > -accel kvm -machine virt,gic-version=host \ > > -cpu host \ > > -smp 6,sockets=2,cores=3,threads=1 \ > > -m 1024M,slots=16,maxmem=64G \ > > -object memory-backend-ram,id=mem0,size=128M \ > > -object memory-backend-ram,id=mem1,size=128M \ > > -object memory-backend-ram,id=mem2,size=128M \ > > -object memory-backend-ram,id=mem3,size=128M \ > > -object memory-backend-ram,id=mem4,size=128M \ > > -object memory-backend-ram,id=mem4,size=384M \ > > -numa node,nodeid=0,memdev=mem0 \ > > -numa node,nodeid=1,memdev=mem1 \ > > -numa node,nodeid=2,memdev=mem2 \ > > -numa node,nodeid=3,memdev=mem3 \ > > -numa node,nodeid=4,memdev=mem4 \ > > -numa node,nodeid=5,memdev=mem5 > > : > > alternatives: patching kernel code > > BUG: arch topology borken > > the CLS domain not a subset of the MC domain > > > > BUG: arch topology borken > > the DIE domain not a subset of the NODE domain > > > > With current implementation of mc->get_default_cpu_node_id(), CPU#0 to > > CPU#5 > > are associated with NODE#0 to NODE#5 separately. That's incorrect > > because > > CPU#0/1/2 should be associated with same NUMA node because they're > > seated > > in same socket. > > > > This fixes the issue by considering the socket when default CPU-to-NUMA > > is given. With this applied, no more CPU topology broken warnings are > > seen > > from the Linux guest. The 6 CPUs are associated with NODE#0/1, but > > there are > > no CPUs associated with NODE#2/3/4/5. > > > From migration point of view it looks fine to me, and doesn't need a > > compat knob > since NUMA data (on virt-arm) only used to construct ACPI tables (and we > don't > version those unless something is broken by it). > > > > Signed-off-by: Gavin Shan > > --- > > hw/arm/virt.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > index 141350bf21..b4a95522d3 100644 > > --- a/hw/arm/virt.c > > +++ b/hw/arm/virt.c > > @@ -2499,7 +2499,7 @@ virt_cpu_index_to_props(MachineState *ms, > > unsigned cpu_index) > > static int64_t virt_get_default_cpu_node_id(const MachineState *ms, > > int idx) > > { > > - return idx % ms->numa_state->num_nodes; > > + return idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * > > ms->smp.threads); > > I'd like for ARM folks to confirm whether above is correct > (i.e. socket is NUMA node boundary and also if above topo vars > could have odd values. Don't look at horribly complicated x86 > as example, but it showed that vendors could stash pretty much > anything there, so we should consider it here as well and maybe > forbid that in smp virt-arm parser) > > >>> > >>> After doing some investigation, I don't think the socket is NUMA node > >>> boundary. > >>> Unfortunately, I didn't find it's documented like this in any documents > >>> after > >>> checking device-tree specification, Linux CPU topology and NUMA binding > >>> documents. > >>> > >>> However, there are two options here according to Linux (guest) kernel > >>> code: > >>> (A) socket is NUMA node boundary (B) CPU die is NUMA node boundary. They > >>> are > >>> equivalent as CPU die isn't supported on arm/virt machine. Besides, the > >>> topology > >>> of one-to-one association between socket and NUMA node sounds natural and > >>> simplified. > >>> So I think (A) is the best way to go. > >>> > >>> Another thing I want to explain here is how t
Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID
Hi Igor, On 2/25/22 6:03 PM, Igor Mammedov wrote: On Fri, 25 Feb 2022 16:41:43 +0800 Gavin Shan wrote: On 2/17/22 10:14 AM, Gavin Shan wrote: On 1/26/22 5:14 PM, Igor Mammedov wrote: On Wed, 26 Jan 2022 13:24:10 +0800 Gavin Shan wrote: The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id() when it isn't provided explicitly. However, the CPU topology isn't fully considered in the default association and it causes CPU topology broken warnings on booting Linux guest. For example, the following warning messages are observed when the Linux guest is booted with the following command lines. /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host \ -cpu host \ -smp 6,sockets=2,cores=3,threads=1 \ -m 1024M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=128M \ -object memory-backend-ram,id=mem1,size=128M \ -object memory-backend-ram,id=mem2,size=128M \ -object memory-backend-ram,id=mem3,size=128M \ -object memory-backend-ram,id=mem4,size=128M \ -object memory-backend-ram,id=mem4,size=384M \ -numa node,nodeid=0,memdev=mem0 \ -numa node,nodeid=1,memdev=mem1 \ -numa node,nodeid=2,memdev=mem2 \ -numa node,nodeid=3,memdev=mem3 \ -numa node,nodeid=4,memdev=mem4 \ -numa node,nodeid=5,memdev=mem5 : alternatives: patching kernel code BUG: arch topology borken the CLS domain not a subset of the MC domain BUG: arch topology borken the DIE domain not a subset of the NODE domain With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5 are associated with NODE#0 to NODE#5 separately. That's incorrect because CPU#0/1/2 should be associated with same NUMA node because they're seated in same socket. This fixes the issue by considering the socket when default CPU-to-NUMA is given. With this applied, no more CPU topology broken warnings are seen from the Linux guest. The 6 CPUs are associated with NODE#0/1, but there are no CPUs associated with NODE#2/3/4/5. From migration point of view it looks fine to me, and doesn't need a compat knob since NUMA data (on virt-arm) only used to construct ACPI tables (and we don't version those unless something is broken by it). Signed-off-by: Gavin Shan --- hw/arm/virt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 141350bf21..b4a95522d3 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2499,7 +2499,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) { - return idx % ms->numa_state->num_nodes; + return idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * ms->smp.threads); I'd like for ARM folks to confirm whether above is correct (i.e. socket is NUMA node boundary and also if above topo vars could have odd values. Don't look at horribly complicated x86 as example, but it showed that vendors could stash pretty much anything there, so we should consider it here as well and maybe forbid that in smp virt-arm parser) After doing some investigation, I don't think the socket is NUMA node boundary. Unfortunately, I didn't find it's documented like this in any documents after checking device-tree specification, Linux CPU topology and NUMA binding documents. However, there are two options here according to Linux (guest) kernel code: (A) socket is NUMA node boundary (B) CPU die is NUMA node boundary. They are equivalent as CPU die isn't supported on arm/virt machine. Besides, the topology of one-to-one association between socket and NUMA node sounds natural and simplified. So I think (A) is the best way to go. Another thing I want to explain here is how the changes affect the memory allocation in Linux guest. Taking the command lines included in the commit log as an example, the first two NUMA nodes are bound to CPUs while the other 4 NUMA nodes are regarded as remote NUMA nodes to CPUs. The remote NUMA node won't accommodate the memory allocation until the memory in the near (local) NUMA node becomes exhausted. However, it's uncertain how the memory is hosted if memory binding isn't applied. Besides, I think the code should be improved like below to avoid overflow on ms->numa_state->num_nodes. static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) { - return idx % ms->numa_state->num_nodes; + int node_idx; + + node_idx = idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * ms->smp.threads); + return node_idx % ms->numa_state->num_nodes; using idx directly to
Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID
On Fri, 25 Feb 2022 16:41:43 +0800 Gavin Shan wrote: > Hi Igor, > > On 2/17/22 10:14 AM, Gavin Shan wrote: > > On 1/26/22 5:14 PM, Igor Mammedov wrote: > >> On Wed, 26 Jan 2022 13:24:10 +0800 > >> Gavin Shan wrote: > >> > >>> The default CPU-to-NUMA association is given by > >>> mc->get_default_cpu_node_id() > >>> when it isn't provided explicitly. However, the CPU topology isn't fully > >>> considered in the default association and it causes CPU topology broken > >>> warnings on booting Linux guest. > >>> > >>> For example, the following warning messages are observed when the Linux > >>> guest > >>> is booted with the following command lines. > >>> > >>> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ > >>> -accel kvm -machine virt,gic-version=host \ > >>> -cpu host \ > >>> -smp 6,sockets=2,cores=3,threads=1 \ > >>> -m 1024M,slots=16,maxmem=64G \ > >>> -object memory-backend-ram,id=mem0,size=128M \ > >>> -object memory-backend-ram,id=mem1,size=128M \ > >>> -object memory-backend-ram,id=mem2,size=128M \ > >>> -object memory-backend-ram,id=mem3,size=128M \ > >>> -object memory-backend-ram,id=mem4,size=128M \ > >>> -object memory-backend-ram,id=mem4,size=384M \ > >>> -numa node,nodeid=0,memdev=mem0 \ > >>> -numa node,nodeid=1,memdev=mem1 \ > >>> -numa node,nodeid=2,memdev=mem2 \ > >>> -numa node,nodeid=3,memdev=mem3 \ > >>> -numa node,nodeid=4,memdev=mem4 \ > >>> -numa node,nodeid=5,memdev=mem5 > >>> : > >>> alternatives: patching kernel code > >>> BUG: arch topology borken > >>> the CLS domain not a subset of the MC domain > >>> > >>> BUG: arch topology borken > >>> the DIE domain not a subset of the NODE domain > >>> > >>> With current implementation of mc->get_default_cpu_node_id(), CPU#0 to > >>> CPU#5 > >>> are associated with NODE#0 to NODE#5 separately. That's incorrect because > >>> CPU#0/1/2 should be associated with same NUMA node because they're seated > >>> in same socket. > >>> > >>> This fixes the issue by considering the socket when default CPU-to-NUMA > >>> is given. With this applied, no more CPU topology broken warnings are seen > >>> from the Linux guest. The 6 CPUs are associated with NODE#0/1, but there > >>> are > >>> no CPUs associated with NODE#2/3/4/5. > >> > >>> From migration point of view it looks fine to me, and doesn't need a > >>> compat knob > >> since NUMA data (on virt-arm) only used to construct ACPI tables (and we > >> don't > >> version those unless something is broken by it). > >> > >> > >>> Signed-off-by: Gavin Shan > >>> --- > >>> hw/arm/virt.c | 2 +- > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c > >>> index 141350bf21..b4a95522d3 100644 > >>> --- a/hw/arm/virt.c > >>> +++ b/hw/arm/virt.c > >>> @@ -2499,7 +2499,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned > >>> cpu_index) > >>> static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int > >>> idx) > >>> { > >>> - return idx % ms->numa_state->num_nodes; > >>> + return idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * > >>> ms->smp.threads); > >> > >> I'd like for ARM folks to confirm whether above is correct > >> (i.e. socket is NUMA node boundary and also if above topo vars > >> could have odd values. Don't look at horribly complicated x86 > >> as example, but it showed that vendors could stash pretty much > >> anything there, so we should consider it here as well and maybe > >> forbid that in smp virt-arm parser) > >> > > > > After doing some investigation, I don't think the socket is NUMA node > > boundary. > > Unfortunately, I didn't find it's documented like this in any documents > > after > > checking device-tree specification, Linux CPU topology and NUMA binding > > documents. > > > > However, there are two options here according to Linux (guest) kernel code: > > (A) socket is NUMA node boundary (B) CPU die is NUMA node boundary. They > > are > > equivalent as CPU die isn't supported on arm/virt machine. Besides, the > > topology > > of one-to-one association between socket and NUMA node sounds natural and > > simplified. > > So I think (A) is the best way to go. > > > > Another thing I want to explain here is how the changes affect the memory > > allocation in Linux guest. Taking the command lines included in the commit > > log as an example, the first two NUMA nodes are bound to CPUs while the > > other > > 4 NUMA nodes are regarded as remote NUMA nodes to CPUs. The remote NUMA node > > won't accommodate the memory allocation until the memory in the near (local) > > NUMA node becomes exhaust
Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID
Hi Igor, On 2/17/22 10:14 AM, Gavin Shan wrote: On 1/26/22 5:14 PM, Igor Mammedov wrote: On Wed, 26 Jan 2022 13:24:10 +0800 Gavin Shan wrote: The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id() when it isn't provided explicitly. However, the CPU topology isn't fully considered in the default association and it causes CPU topology broken warnings on booting Linux guest. For example, the following warning messages are observed when the Linux guest is booted with the following command lines. /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host \ -cpu host \ -smp 6,sockets=2,cores=3,threads=1 \ -m 1024M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=128M \ -object memory-backend-ram,id=mem1,size=128M \ -object memory-backend-ram,id=mem2,size=128M \ -object memory-backend-ram,id=mem3,size=128M \ -object memory-backend-ram,id=mem4,size=128M \ -object memory-backend-ram,id=mem4,size=384M \ -numa node,nodeid=0,memdev=mem0 \ -numa node,nodeid=1,memdev=mem1 \ -numa node,nodeid=2,memdev=mem2 \ -numa node,nodeid=3,memdev=mem3 \ -numa node,nodeid=4,memdev=mem4 \ -numa node,nodeid=5,memdev=mem5 : alternatives: patching kernel code BUG: arch topology borken the CLS domain not a subset of the MC domain BUG: arch topology borken the DIE domain not a subset of the NODE domain With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5 are associated with NODE#0 to NODE#5 separately. That's incorrect because CPU#0/1/2 should be associated with same NUMA node because they're seated in same socket. This fixes the issue by considering the socket when default CPU-to-NUMA is given. With this applied, no more CPU topology broken warnings are seen from the Linux guest. The 6 CPUs are associated with NODE#0/1, but there are no CPUs associated with NODE#2/3/4/5. From migration point of view it looks fine to me, and doesn't need a compat knob since NUMA data (on virt-arm) only used to construct ACPI tables (and we don't version those unless something is broken by it). Signed-off-by: Gavin Shan --- hw/arm/virt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 141350bf21..b4a95522d3 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2499,7 +2499,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) { - return idx % ms->numa_state->num_nodes; + return idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * ms->smp.threads); I'd like for ARM folks to confirm whether above is correct (i.e. socket is NUMA node boundary and also if above topo vars could have odd values. Don't look at horribly complicated x86 as example, but it showed that vendors could stash pretty much anything there, so we should consider it here as well and maybe forbid that in smp virt-arm parser) After doing some investigation, I don't think the socket is NUMA node boundary. Unfortunately, I didn't find it's documented like this in any documents after checking device-tree specification, Linux CPU topology and NUMA binding documents. However, there are two options here according to Linux (guest) kernel code: (A) socket is NUMA node boundary (B) CPU die is NUMA node boundary. They are equivalent as CPU die isn't supported on arm/virt machine. Besides, the topology of one-to-one association between socket and NUMA node sounds natural and simplified. So I think (A) is the best way to go. Another thing I want to explain here is how the changes affect the memory allocation in Linux guest. Taking the command lines included in the commit log as an example, the first two NUMA nodes are bound to CPUs while the other 4 NUMA nodes are regarded as remote NUMA nodes to CPUs. The remote NUMA node won't accommodate the memory allocation until the memory in the near (local) NUMA node becomes exhausted. However, it's uncertain how the memory is hosted if memory binding isn't applied. Besides, I think the code should be improved like below to avoid overflow on ms->numa_state->num_nodes. static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) { - return idx % ms->numa_state->num_nodes; + int node_idx; + + node_idx = idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * ms->smp.threads); + return node_idx % ms->numa_state->num_nodes; } Kindly ping... } static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) Thanks, Gavin
Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID
On 1/26/22 5:14 PM, Igor Mammedov wrote: On Wed, 26 Jan 2022 13:24:10 +0800 Gavin Shan wrote: The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id() when it isn't provided explicitly. However, the CPU topology isn't fully considered in the default association and it causes CPU topology broken warnings on booting Linux guest. For example, the following warning messages are observed when the Linux guest is booted with the following command lines. /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host \ -cpu host \ -smp 6,sockets=2,cores=3,threads=1 \ -m 1024M,slots=16,maxmem=64G\ -object memory-backend-ram,id=mem0,size=128M\ -object memory-backend-ram,id=mem1,size=128M\ -object memory-backend-ram,id=mem2,size=128M\ -object memory-backend-ram,id=mem3,size=128M\ -object memory-backend-ram,id=mem4,size=128M\ -object memory-backend-ram,id=mem4,size=384M\ -numa node,nodeid=0,memdev=mem0 \ -numa node,nodeid=1,memdev=mem1 \ -numa node,nodeid=2,memdev=mem2 \ -numa node,nodeid=3,memdev=mem3 \ -numa node,nodeid=4,memdev=mem4 \ -numa node,nodeid=5,memdev=mem5 : alternatives: patching kernel code BUG: arch topology borken the CLS domain not a subset of the MC domain BUG: arch topology borken the DIE domain not a subset of the NODE domain With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5 are associated with NODE#0 to NODE#5 separately. That's incorrect because CPU#0/1/2 should be associated with same NUMA node because they're seated in same socket. This fixes the issue by considering the socket when default CPU-to-NUMA is given. With this applied, no more CPU topology broken warnings are seen from the Linux guest. The 6 CPUs are associated with NODE#0/1, but there are no CPUs associated with NODE#2/3/4/5. From migration point of view it looks fine to me, and doesn't need a compat knob since NUMA data (on virt-arm) only used to construct ACPI tables (and we don't version those unless something is broken by it). Signed-off-by: Gavin Shan --- hw/arm/virt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 141350bf21..b4a95522d3 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2499,7 +2499,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) { -return idx % ms->numa_state->num_nodes; +return idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * ms->smp.threads); I'd like for ARM folks to confirm whether above is correct (i.e. socket is NUMA node boundary and also if above topo vars could have odd values. Don't look at horribly complicated x86 as example, but it showed that vendors could stash pretty much anything there, so we should consider it here as well and maybe forbid that in smp virt-arm parser) After doing some investigation, I don't think the socket is NUMA node boundary. Unfortunately, I didn't find it's documented like this in any documents after checking device-tree specification, Linux CPU topology and NUMA binding documents. However, there are two options here according to Linux (guest) kernel code: (A) socket is NUMA node boundary (B) CPU die is NUMA node boundary. They are equivalent as CPU die isn't supported on arm/virt machine. Besides, the topology of one-to-one association between socket and NUMA node sounds natural and simplified. So I think (A) is the best way to go. Another thing I want to explain here is how the changes affect the memory allocation in Linux guest. Taking the command lines included in the commit log as an example, the first two NUMA nodes are bound to CPUs while the other 4 NUMA nodes are regarded as remote NUMA nodes to CPUs. The remote NUMA node won't accommodate the memory allocation until the memory in the near (local) NUMA node becomes exhausted. However, it's uncertain how the memory is hosted if memory binding isn't applied. Besides, I think the code should be improved like below to avoid overflow on ms->numa_state->num_nodes. static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) { -return idx % ms->numa_state->num_nodes; +int node_idx; + +node_idx = idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * ms->smp.threads); +return node_idx % ms->numa_state->num_nodes; } } static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) Thanks, Gavin
Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID
On 2/15/22 4:32 PM, Andrew Jones wrote: On Tue, Feb 15, 2022 at 04:19:01PM +0800, Gavin Shan wrote: The issue isn't related to CPU topology directly. It's actually related to the fact: the default NUMA node ID will be picked for one particular CPU if the associated NUMA node ID isn't provided by users explicitly. So it's related to the CPU-to-NUMA association. For example, the CPU-to-NUMA association is breaking socket boundary without the code change included in this patch when the guest is booted with the command lines like below. With this patch applied, the CPU-to-NUMA association is following socket boundary, to make Linux guest happy. Gavin, Please look at Igor's request for more information. Are we sure that a socket is a NUMA node boundary? Are we sure we can assume an even distribution for sockets to nodes or nodes to sockets? If so, where is that documented? Yes, I was investigating the code for Igor's questions, but I didn't reach to conclusion when I replied to Yanan. I will reply to Igor's thread and lets discuss it through over thread. Thanks, Gavin
Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID
On 1/28/22 3:05 PM, wangyanan (Y) via wrote On 2022/1/26 17:14, Igor Mammedov wrote: On Wed, 26 Jan 2022 13:24:10 +0800 Gavin Shan wrote: The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id() when it isn't provided explicitly. However, the CPU topology isn't fully considered in the default association and it causes CPU topology broken warnings on booting Linux guest. For example, the following warning messages are observed when the Linux guest is booted with the following command lines. /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host \ -cpu host \ -smp 6,sockets=2,cores=3,threads=1 \ -m 1024M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=128M \ -object memory-backend-ram,id=mem1,size=128M \ -object memory-backend-ram,id=mem2,size=128M \ -object memory-backend-ram,id=mem3,size=128M \ -object memory-backend-ram,id=mem4,size=128M \ -object memory-backend-ram,id=mem4,size=384M \ -numa node,nodeid=0,memdev=mem0 \ -numa node,nodeid=1,memdev=mem1 \ -numa node,nodeid=2,memdev=mem2 \ -numa node,nodeid=3,memdev=mem3 \ -numa node,nodeid=4,memdev=mem4 \ -numa node,nodeid=5,memdev=mem5 : alternatives: patching kernel code BUG: arch topology borken the CLS domain not a subset of the MC domain BUG: arch topology borken the DIE domain not a subset of the NODE domain With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5 are associated with NODE#0 to NODE#5 separately. That's incorrect because CPU#0/1/2 should be associated with same NUMA node because they're seated in same socket. This fixes the issue by considering the socket when default CPU-to-NUMA is given. With this applied, no more CPU topology broken warnings are seen from the Linux guest. The 6 CPUs are associated with NODE#0/1, but there are no CPUs associated with NODE#2/3/4/5. >From migration point of view it looks fine to me, and doesn't need a compat knob since NUMA data (on virt-arm) only used to construct ACPI tables (and we don't version those unless something is broken by it). Signed-off-by: Gavin Shan --- hw/arm/virt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 141350bf21..b4a95522d3 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2499,7 +2499,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) { - return idx % ms->numa_state->num_nodes; + return idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * ms->smp.threads); I'd like for ARM folks to confirm whether above is correct (i.e. socket is NUMA node boundary and also if above topo vars could have odd values. Don't look at horribly complicated x86 as example, but it showed that vendors could stash pretty much anything there, so we should consider it here as well and maybe forbid that in smp virt-arm parser) We now have a generic smp parser in machine-smp.c and it guarantees different machine boards a correct group of topo vars: supported topo vars being valid and value of unsupported ones being 1. I think it's safe to use them here. Or am I missing something else? Also, we may not need to include "dies" here because it's not supported on ARM virt machine. I believe we will always have "ms->smp.dies==1" for this machine. I'm sorry for the delayed response because I'm just back from two weeks holiday. The issue isn't related to CPU topology directly. It's actually related to the fact: the default NUMA node ID will be picked for one particular CPU if the associated NUMA node ID isn't provided by users explicitly. So it's related to the CPU-to-NUMA association. For example, the CPU-to-NUMA association is breaking socket boundary without the code change included in this patch when the guest is booted with the command lines like below. With this patch applied, the CPU-to-NUMA association is following socket boundary, to make Linux guest happy. -smp 6,sockets=2,cores=3,threads=1 \ -m 1024M,slots=16,maxmem=64G\ -object memory-backend-ram,id=mem0,size=128M\ -object memory-backend-ram,id=mem1,size=128M\ -object memory-backend-ram,id=mem2,size=128M\ -object memory-backend-ram,id=mem3,size=128M\ -object memory-backend-ram,id=mem4,size=128M\ -object memory-backend-ram,id=mem4,size=384M\ -numa node,nodeid=0,memdev=mem0 \ -numa node,nodeid=1,memdev=mem1
Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID
On Tue, Feb 15, 2022 at 04:19:01PM +0800, Gavin Shan wrote: > The issue isn't related to CPU topology directly. It's actually related > to the fact: the default NUMA node ID will be picked for one particular > CPU if the associated NUMA node ID isn't provided by users explicitly. > So it's related to the CPU-to-NUMA association. > > For example, the CPU-to-NUMA association is breaking socket boundary > without the code change included in this patch when the guest is booted > with the command lines like below. With this patch applied, the CPU-to-NUMA > association is following socket boundary, to make Linux guest happy. Gavin, Please look at Igor's request for more information. Are we sure that a socket is a NUMA node boundary? Are we sure we can assume an even distribution for sockets to nodes or nodes to sockets? If so, where is that documented? Thanks, drew
Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID
On Wed, 26 Jan 2022 at 09:14, Igor Mammedov wrote: > > On Wed, 26 Jan 2022 13:24:10 +0800 > Gavin Shan wrote: > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > index 141350bf21..b4a95522d3 100644 > > --- a/hw/arm/virt.c > > +++ b/hw/arm/virt.c > > @@ -2499,7 +2499,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned > > cpu_index) > > > > static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int > > idx) > > { > > -return idx % ms->numa_state->num_nodes; > > +return idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * > > ms->smp.threads); > > I'd like for ARM folks to confirm whether above is correct > (i.e. socket is NUMA node boundary and also if above topo vars > could have odd values. Don't look at horribly complicated x86 > as example, but it showed that vendors could stash pretty much > anything there, so we should consider it here as well and maybe > forbid that in smp virt-arm parser) Is there anybody on the CC list who can answer this definitively? Certainly I have no idea about this virtual topology stuff -- from my point of view I just want VMs to be able to have multiple CPUs and I don't know anything about how real hardware might choose to do NUMA topology either now or in future... Put another way: this patch isn't on my list to do anything with; please ping me when a decision has been made about whether it should be applied or not. thanks -- PMM
Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID
Hi, On 2022/1/26 17:14, Igor Mammedov wrote: On Wed, 26 Jan 2022 13:24:10 +0800 Gavin Shan wrote: The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id() when it isn't provided explicitly. However, the CPU topology isn't fully considered in the default association and it causes CPU topology broken warnings on booting Linux guest. For example, the following warning messages are observed when the Linux guest is booted with the following command lines. /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host \ -cpu host \ -smp 6,sockets=2,cores=3,threads=1 \ -m 1024M,slots=16,maxmem=64G\ -object memory-backend-ram,id=mem0,size=128M\ -object memory-backend-ram,id=mem1,size=128M\ -object memory-backend-ram,id=mem2,size=128M\ -object memory-backend-ram,id=mem3,size=128M\ -object memory-backend-ram,id=mem4,size=128M\ -object memory-backend-ram,id=mem4,size=384M\ -numa node,nodeid=0,memdev=mem0 \ -numa node,nodeid=1,memdev=mem1 \ -numa node,nodeid=2,memdev=mem2 \ -numa node,nodeid=3,memdev=mem3 \ -numa node,nodeid=4,memdev=mem4 \ -numa node,nodeid=5,memdev=mem5 : alternatives: patching kernel code BUG: arch topology borken the CLS domain not a subset of the MC domain BUG: arch topology borken the DIE domain not a subset of the NODE domain With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5 are associated with NODE#0 to NODE#5 separately. That's incorrect because CPU#0/1/2 should be associated with same NUMA node because they're seated in same socket. This fixes the issue by considering the socket when default CPU-to-NUMA is given. With this applied, no more CPU topology broken warnings are seen from the Linux guest. The 6 CPUs are associated with NODE#0/1, but there are no CPUs associated with NODE#2/3/4/5. >From migration point of view it looks fine to me, and doesn't need a compat knob since NUMA data (on virt-arm) only used to construct ACPI tables (and we don't version those unless something is broken by it). Signed-off-by: Gavin Shan --- hw/arm/virt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 141350bf21..b4a95522d3 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2499,7 +2499,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) { -return idx % ms->numa_state->num_nodes; +return idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * ms->smp.threads); I'd like for ARM folks to confirm whether above is correct (i.e. socket is NUMA node boundary and also if above topo vars could have odd values. Don't look at horribly complicated x86 as example, but it showed that vendors could stash pretty much anything there, so we should consider it here as well and maybe forbid that in smp virt-arm parser) We now have a generic smp parser in machine-smp.c and it guarantees different machine boards a correct group of topo vars: supported topo vars being valid and value of unsupported ones being 1. I think it's safe to use them here. Or am I missing something else? Also, we may not need to include "dies" here because it's not supported on ARM virt machine. I believe we will always have "ms->smp.dies==1" for this machine. Thanks, Yanan } static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) .
Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID
On Wed, 26 Jan 2022 13:24:10 +0800 Gavin Shan wrote: > The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id() > when it isn't provided explicitly. However, the CPU topology isn't fully > considered in the default association and it causes CPU topology broken > warnings on booting Linux guest. > > For example, the following warning messages are observed when the Linux guest > is booted with the following command lines. > > /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ > -accel kvm -machine virt,gic-version=host \ > -cpu host \ > -smp 6,sockets=2,cores=3,threads=1 \ > -m 1024M,slots=16,maxmem=64G\ > -object memory-backend-ram,id=mem0,size=128M\ > -object memory-backend-ram,id=mem1,size=128M\ > -object memory-backend-ram,id=mem2,size=128M\ > -object memory-backend-ram,id=mem3,size=128M\ > -object memory-backend-ram,id=mem4,size=128M\ > -object memory-backend-ram,id=mem4,size=384M\ > -numa node,nodeid=0,memdev=mem0 \ > -numa node,nodeid=1,memdev=mem1 \ > -numa node,nodeid=2,memdev=mem2 \ > -numa node,nodeid=3,memdev=mem3 \ > -numa node,nodeid=4,memdev=mem4 \ > -numa node,nodeid=5,memdev=mem5 > : > alternatives: patching kernel code > BUG: arch topology borken > the CLS domain not a subset of the MC domain > > BUG: arch topology borken > the DIE domain not a subset of the NODE domain > > With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5 > are associated with NODE#0 to NODE#5 separately. That's incorrect because > CPU#0/1/2 should be associated with same NUMA node because they're seated > in same socket. > > This fixes the issue by considering the socket when default CPU-to-NUMA > is given. With this applied, no more CPU topology broken warnings are seen > from the Linux guest. The 6 CPUs are associated with NODE#0/1, but there are > no CPUs associated with NODE#2/3/4/5. >From migration point of view it looks fine to me, and doesn't need a compat >knob since NUMA data (on virt-arm) only used to construct ACPI tables (and we don't version those unless something is broken by it). > Signed-off-by: Gavin Shan > --- > hw/arm/virt.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index 141350bf21..b4a95522d3 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -2499,7 +2499,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned > cpu_index) > > static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) > { > -return idx % ms->numa_state->num_nodes; > +return idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * > ms->smp.threads); I'd like for ARM folks to confirm whether above is correct (i.e. socket is NUMA node boundary and also if above topo vars could have odd values. Don't look at horribly complicated x86 as example, but it showed that vendors could stash pretty much anything there, so we should consider it here as well and maybe forbid that in smp virt-arm parser) > } > > static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID
CCing Igor. Thanks, drew On Wed, Jan 26, 2022 at 01:24:10PM +0800, Gavin Shan wrote: > The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id() > when it isn't provided explicitly. However, the CPU topology isn't fully > considered in the default association and it causes CPU topology broken > warnings on booting Linux guest. > > For example, the following warning messages are observed when the Linux guest > is booted with the following command lines. > > /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ > -accel kvm -machine virt,gic-version=host \ > -cpu host \ > -smp 6,sockets=2,cores=3,threads=1 \ > -m 1024M,slots=16,maxmem=64G\ > -object memory-backend-ram,id=mem0,size=128M\ > -object memory-backend-ram,id=mem1,size=128M\ > -object memory-backend-ram,id=mem2,size=128M\ > -object memory-backend-ram,id=mem3,size=128M\ > -object memory-backend-ram,id=mem4,size=128M\ > -object memory-backend-ram,id=mem4,size=384M\ > -numa node,nodeid=0,memdev=mem0 \ > -numa node,nodeid=1,memdev=mem1 \ > -numa node,nodeid=2,memdev=mem2 \ > -numa node,nodeid=3,memdev=mem3 \ > -numa node,nodeid=4,memdev=mem4 \ > -numa node,nodeid=5,memdev=mem5 > : > alternatives: patching kernel code > BUG: arch topology borken > the CLS domain not a subset of the MC domain > > BUG: arch topology borken > the DIE domain not a subset of the NODE domain > > With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5 > are associated with NODE#0 to NODE#5 separately. That's incorrect because > CPU#0/1/2 should be associated with same NUMA node because they're seated > in same socket. > > This fixes the issue by considering the socket when default CPU-to-NUMA > is given. With this applied, no more CPU topology broken warnings are seen > from the Linux guest. The 6 CPUs are associated with NODE#0/1, but there are > no CPUs associated with NODE#2/3/4/5. > > Signed-off-by: Gavin Shan > --- > hw/arm/virt.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index 141350bf21..b4a95522d3 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -2499,7 +2499,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned > cpu_index) > > static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) > { > -return idx % ms->numa_state->num_nodes; > +return idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * > ms->smp.threads); > } > > static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) > -- > 2.23.0 >