[Qemu-devel] [PATCH for 3.1 v3] spapr: Fix ibm, max-associativity-domains property number of nodes
Laurent Vivier reported off by one with maximum number of NUMA nodes provided by qemu-kvm being less by one than required according to description of "ibm,max-associativity-domains" property in LoPAPR. It appears that I incorrectly treated LoPAPR description of this property assuming it provides last valid domain (NUMA node here) instead of maximum number of domains. ### Before hot-add (qemu) info numa 3 nodes node 0 cpus: 0 node 0 size: 0 MB node 0 plugged: 0 MB node 1 cpus: node 1 size: 1024 MB node 1 plugged: 0 MB node 2 cpus: node 2 size: 0 MB node 2 plugged: 0 MB $ numactl -H available: 2 nodes (0-1) node 0 cpus: 0 node 0 size: 0 MB node 0 free: 0 MB node 1 cpus: node 1 size: 999 MB node 1 free: 658 MB node distances: node 0 1 0: 10 40 1: 40 10 ### Hot-add (qemu) object_add memory-backend-ram,id=mem0,size=1G (qemu) device_add pc-dimm,id=dimm1,memdev=mem0,node=2 (qemu) [ 87.704898] pseries-hotplug-mem: Attempting to hot-add 4 ... [ 87.705128] lpar: Attempting to resize HPT to shift 21 ... ### After hot-add (qemu) info numa 3 nodes node 0 cpus: 0 node 0 size: 0 MB node 0 plugged: 0 MB node 1 cpus: node 1 size: 1024 MB node 1 plugged: 0 MB node 2 cpus: node 2 size: 1024 MB node 2 plugged: 1024 MB $ numactl -H available: 2 nodes (0-1) Still only two nodes (and memory hot-added to node 0 below) node 0 cpus: 0 node 0 size: 1024 MB node 0 free: 1021 MB node 1 cpus: node 1 size: 999 MB node 1 free: 658 MB node distances: node 0 1 0: 10 40 1: 40 10 After fix applied numactl(8) reports 3 nodes available and memory plugged into node 2 as expected. >From David Gibson: -- Qemu makes a distinction between "non NUMA" (nb_numa_nodes == 0) and "NUMA with one node" (nb_numa_nodes == 1). But from a PAPR guests's point of view these are equivalent. I don't want to present two different cases to the guest when we don't need to, so even though the guest can handle it, I'd prefer we put a '1' here for both the nb_numa_nodes == 0 and nb_numa_nodes == 1 case. This consolidates everything discussed previously on mailing list. Fixes: da9f80fbad21 ("spapr: Add ibm,max-associativity-domains property") Reported-by: Laurent Vivier Signed-off-by: Serhii Popovych --- hw/ppc/spapr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 7afd1a1..2ee7201 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1033,7 +1033,7 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void *fdt) cpu_to_be32(0), cpu_to_be32(0), cpu_to_be32(0), -cpu_to_be32(nb_numa_nodes ? nb_numa_nodes - 1 : 0), +cpu_to_be32(nb_numa_nodes ? nb_numa_nodes : 1), }; _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas")); -- 1.8.3.1
Re: [Qemu-devel] [Qemu-ppc] [PATCH for 3.1] spapr: Fix ibm, max-associativity-domains property number of nodes
Greg Kurz wrote: > On Mon, 19 Nov 2018 14:48:34 +0100 > Laurent Vivier wrote: > >> On 19/11/2018 14:27, Greg Kurz wrote: >>> On Mon, 19 Nov 2018 08:09:38 -0500 >>> Serhii Popovych wrote: >>> >>>> Laurent Vivier reported off by one with maximum number of NUMA nodes >>>> provided by qemu-kvm being less by one than required according to >>>> description of "ibm,max-associativity-domains" property in LoPAPR. >>>> >>>> It appears that I incorrectly treated LoPAPR description of this >>>> property assuming it provides last valid domain (NUMA node here) >>>> instead of maximum number of domains. >>>> >>>> ### Before hot-add >>>> >>>> (qemu) info numa >>>> 3 nodes >>>> node 0 cpus: 0 >>>> node 0 size: 0 MB >>>> node 0 plugged: 0 MB >>>> node 1 cpus: >>>> node 1 size: 1024 MB >>>> node 1 plugged: 0 MB >>>> node 2 cpus: >>>> node 2 size: 0 MB >>>> node 2 plugged: 0 MB >>>> >>>> $ numactl -H >>>> available: 2 nodes (0-1) >>>> node 0 cpus: 0 >>>> node 0 size: 0 MB >>>> node 0 free: 0 MB >>>> node 1 cpus: >>>> node 1 size: 999 MB >>>> node 1 free: 658 MB >>>> node distances: >>>> node 0 1 >>>> 0: 10 40 >>>> 1: 40 10 >>>> >>>> ### Hot-add >>>> >>>> (qemu) object_add memory-backend-ram,id=mem0,size=1G >>>> (qemu) device_add pc-dimm,id=dimm1,memdev=mem0,node=2 >>>> (qemu) [ 87.704898] pseries-hotplug-mem: Attempting to hot-add 4 ... >>>> >>>> [ 87.705128] lpar: Attempting to resize HPT to shift 21 >>>> ... >>>> >>>> ### After hot-add >>>> >>>> (qemu) info numa >>>> 3 nodes >>>> node 0 cpus: 0 >>>> node 0 size: 0 MB >>>> node 0 plugged: 0 MB >>>> node 1 cpus: >>>> node 1 size: 1024 MB >>>> node 1 plugged: 0 MB >>>> node 2 cpus: >>>> node 2 size: 1024 MB >>>> node 2 plugged: 1024 MB >>>> >>>> $ numactl -H >>>> available: 2 nodes (0-1) >>>> >>>> Still only two nodes (and memory hot-added to node 0 below) >>>> node 0 cpus: 0 >>>> node 0 size: 1024 MB >>>> node 0 free: 1021 MB >>>> node 1 cpus: >>>> node 1 size: 999 MB >>>> node 1 free: 658 MB >>>> node distances: >>>> node 0 1 >>>> 0: 10 40 >>>> 1: 40 10 >>>> >>>> After fix applied numactl(8) reports 3 nodes available and memory >>>> plugged into node 2 as expected. >>>> >>>> Fixes: da9f80fbad21 ("spapr: Add ibm,max-associativity-domains property") >>>> Reported-by: Laurent Vivier >>>> Signed-off-by: Serhii Popovych >>>> --- >>>> hw/ppc/spapr.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >>>> index 7afd1a1..843ae6c 100644 >>>> --- a/hw/ppc/spapr.c >>>> +++ b/hw/ppc/spapr.c >>>> @@ -1033,7 +1033,7 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, >>>> void *fdt) >>>> cpu_to_be32(0), >>>> cpu_to_be32(0), >>>> cpu_to_be32(0), >>>> -cpu_to_be32(nb_numa_nodes ? nb_numa_nodes - 1 : 0), >>>> +cpu_to_be32(nb_numa_nodes ? nb_numa_nodes : 0), >>> >>> Maybe simply cpu_to_be32(nb_numa_nodes) ? >> >> Or "cpu_to_be32(nb_numa_nodes ? nb_numa_nodes : 1)" ? Linux handles zero correctly, but nb_numa_nodes ?: 1 looks better. I did testing with just cpu_to_be32(nb_numa_nodes) and cpu_to_be32(nb_numa_nodes ? nb_numa_nodes : 1) it works with Linux correctly in both cases (guest)# numactl -H available: 1 nodes (0) node 0 cpus: 0 node 0 size: 487 MB node 0 free: 148 MB node distances: node 0 0: 10 (qemu) info numa 0 nodes >> >> In spapr_populate_drconf_memory() we have this logic. >> > > Hmm... maybe you're right, it seems that the code assumes > non-NUMA configs have at one node. Similar assumption is > also present in pc_dimm_realize(): > > if (((nb_numa_nodes > 0) && (dimm->node >= nb_numa_nodes)) || > (!nb_numa_nodes && dimm->node)) According to this nb_numa_nodes can be zero > error_setg(errp, "'DIMM property " PC_DIMM_NODE_PROP " has value %" >PRIu32 "' which exceeds the number of numa nodes: %d", >dimm->node, nb_numa_nodes ? nb_numa_nodes : 1); and this just handles this case to show proper error message. > return; > } > > This is a bit confusing... > >> Thanks, >> Laurent > > -- Thanks, Serhii signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH for 3.1 v2] spapr: Fix ibm, max-associativity-domains property number of nodes
Laurent Vivier reported off by one with maximum number of NUMA nodes provided by qemu-kvm being less by one than required according to description of "ibm,max-associativity-domains" property in LoPAPR. It appears that I incorrectly treated LoPAPR description of this property assuming it provides last valid domain (NUMA node here) instead of maximum number of domains. ### Before hot-add (qemu) info numa 3 nodes node 0 cpus: 0 node 0 size: 0 MB node 0 plugged: 0 MB node 1 cpus: node 1 size: 1024 MB node 1 plugged: 0 MB node 2 cpus: node 2 size: 0 MB node 2 plugged: 0 MB $ numactl -H available: 2 nodes (0-1) node 0 cpus: 0 node 0 size: 0 MB node 0 free: 0 MB node 1 cpus: node 1 size: 999 MB node 1 free: 658 MB node distances: node 0 1 0: 10 40 1: 40 10 ### Hot-add (qemu) object_add memory-backend-ram,id=mem0,size=1G (qemu) device_add pc-dimm,id=dimm1,memdev=mem0,node=2 (qemu) [ 87.704898] pseries-hotplug-mem: Attempting to hot-add 4 ... [ 87.705128] lpar: Attempting to resize HPT to shift 21 ... ### After hot-add (qemu) info numa 3 nodes node 0 cpus: 0 node 0 size: 0 MB node 0 plugged: 0 MB node 1 cpus: node 1 size: 1024 MB node 1 plugged: 0 MB node 2 cpus: node 2 size: 1024 MB node 2 plugged: 1024 MB $ numactl -H available: 2 nodes (0-1) Still only two nodes (and memory hot-added to node 0 below) node 0 cpus: 0 node 0 size: 1024 MB node 0 free: 1021 MB node 1 cpus: node 1 size: 999 MB node 1 free: 658 MB node distances: node 0 1 0: 10 40 1: 40 10 After fix applied numactl(8) reports 3 nodes available and memory plugged into node 2 as expected. Fixes: da9f80fbad21 ("spapr: Add ibm,max-associativity-domains property") Reported-by: Laurent Vivier Signed-off-by: Serhii Popovych --- v2 Remove now unneeded ?: statement previously used to catch -1 as numa node causing Linux guests hanging on boot. hw/ppc/spapr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 7afd1a1..a7171fb 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1033,7 +1033,7 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void *fdt) cpu_to_be32(0), cpu_to_be32(0), cpu_to_be32(0), -cpu_to_be32(nb_numa_nodes ? nb_numa_nodes - 1 : 0), +cpu_to_be32(nb_numa_nodes), }; _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas")); -- 1.8.3.1
Re: [Qemu-devel] [Qemu-ppc] [PATCH for 3.1] spapr: Fix ibm, max-associativity-domains property number of nodes
Laurent Vivier wrote: > On 19/11/2018 14:27, Greg Kurz wrote: >> On Mon, 19 Nov 2018 08:09:38 -0500 >> Serhii Popovych wrote: >> >>> Laurent Vivier reported off by one with maximum number of NUMA nodes >>> provided by qemu-kvm being less by one than required according to >>> description of "ibm,max-associativity-domains" property in LoPAPR. >>> >>> It appears that I incorrectly treated LoPAPR description of this >>> property assuming it provides last valid domain (NUMA node here) >>> instead of maximum number of domains. >>> >>> ### Before hot-add >>> >>> (qemu) info numa >>> 3 nodes >>> node 0 cpus: 0 >>> node 0 size: 0 MB >>> node 0 plugged: 0 MB >>> node 1 cpus: >>> node 1 size: 1024 MB >>> node 1 plugged: 0 MB >>> node 2 cpus: >>> node 2 size: 0 MB >>> node 2 plugged: 0 MB >>> >>> $ numactl -H >>> available: 2 nodes (0-1) >>> node 0 cpus: 0 >>> node 0 size: 0 MB >>> node 0 free: 0 MB >>> node 1 cpus: >>> node 1 size: 999 MB >>> node 1 free: 658 MB >>> node distances: >>> node 0 1 >>> 0: 10 40 >>> 1: 40 10 >>> >>> ### Hot-add >>> >>> (qemu) object_add memory-backend-ram,id=mem0,size=1G >>> (qemu) device_add pc-dimm,id=dimm1,memdev=mem0,node=2 >>> (qemu) [ 87.704898] pseries-hotplug-mem: Attempting to hot-add 4 ... >>> >>> [ 87.705128] lpar: Attempting to resize HPT to shift 21 >>> ... >>> >>> ### After hot-add >>> >>> (qemu) info numa >>> 3 nodes >>> node 0 cpus: 0 >>> node 0 size: 0 MB >>> node 0 plugged: 0 MB >>> node 1 cpus: >>> node 1 size: 1024 MB >>> node 1 plugged: 0 MB >>> node 2 cpus: >>> node 2 size: 1024 MB >>> node 2 plugged: 1024 MB >>> >>> $ numactl -H >>> available: 2 nodes (0-1) >>> ^^^^ >>> Still only two nodes (and memory hot-added to node 0 below) >>> node 0 cpus: 0 >>> node 0 size: 1024 MB >>> node 0 free: 1021 MB >>> node 1 cpus: >>> node 1 size: 999 MB >>> node 1 free: 658 MB >>> node distances: >>> node 0 1 >>> 0: 10 40 >>> 1: 40 10 >>> >>> After fix applied numactl(8) reports 3 nodes available and memory >>> plugged into node 2 as expected. >>> >>> Fixes: da9f80fbad21 ("spapr: Add ibm,max-associativity-domains property") >>> Reported-by: Laurent Vivier >>> Signed-off-by: Serhii Popovych >>> --- >>> hw/ppc/spapr.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >>> index 7afd1a1..843ae6c 100644 >>> --- a/hw/ppc/spapr.c >>> +++ b/hw/ppc/spapr.c >>> @@ -1033,7 +1033,7 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, >>> void *fdt) >>> cpu_to_be32(0), >>> cpu_to_be32(0), >>> cpu_to_be32(0), >>> -cpu_to_be32(nb_numa_nodes ? nb_numa_nodes - 1 : 0), >>> +cpu_to_be32(nb_numa_nodes ? nb_numa_nodes : 0), >> >> Maybe simply cpu_to_be32(nb_numa_nodes) ? > > I agree the "? : " is not needed. > > With "cpu_to_be32(nb_numa_nodes)": > Agree, ?: was relevant only to catch -1 case when running guest w/o NUMA config. Will send v2. Thanks for quick review. > Reviewed-by: Laurent Vivier > > Thanks, > Laurent > -- Thanks, Serhii signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH for 3.1] spapr: Fix ibm, max-associativity-domains property number of nodes
Laurent Vivier reported off by one with maximum number of NUMA nodes provided by qemu-kvm being less by one than required according to description of "ibm,max-associativity-domains" property in LoPAPR. It appears that I incorrectly treated LoPAPR description of this property assuming it provides last valid domain (NUMA node here) instead of maximum number of domains. ### Before hot-add (qemu) info numa 3 nodes node 0 cpus: 0 node 0 size: 0 MB node 0 plugged: 0 MB node 1 cpus: node 1 size: 1024 MB node 1 plugged: 0 MB node 2 cpus: node 2 size: 0 MB node 2 plugged: 0 MB $ numactl -H available: 2 nodes (0-1) node 0 cpus: 0 node 0 size: 0 MB node 0 free: 0 MB node 1 cpus: node 1 size: 999 MB node 1 free: 658 MB node distances: node 0 1 0: 10 40 1: 40 10 ### Hot-add (qemu) object_add memory-backend-ram,id=mem0,size=1G (qemu) device_add pc-dimm,id=dimm1,memdev=mem0,node=2 (qemu) [ 87.704898] pseries-hotplug-mem: Attempting to hot-add 4 ... [ 87.705128] lpar: Attempting to resize HPT to shift 21 ... ### After hot-add (qemu) info numa 3 nodes node 0 cpus: 0 node 0 size: 0 MB node 0 plugged: 0 MB node 1 cpus: node 1 size: 1024 MB node 1 plugged: 0 MB node 2 cpus: node 2 size: 1024 MB node 2 plugged: 1024 MB $ numactl -H available: 2 nodes (0-1) Still only two nodes (and memory hot-added to node 0 below) node 0 cpus: 0 node 0 size: 1024 MB node 0 free: 1021 MB node 1 cpus: node 1 size: 999 MB node 1 free: 658 MB node distances: node 0 1 0: 10 40 1: 40 10 After fix applied numactl(8) reports 3 nodes available and memory plugged into node 2 as expected. Fixes: da9f80fbad21 ("spapr: Add ibm,max-associativity-domains property") Reported-by: Laurent Vivier Signed-off-by: Serhii Popovych --- hw/ppc/spapr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 7afd1a1..843ae6c 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1033,7 +1033,7 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void *fdt) cpu_to_be32(0), cpu_to_be32(0), cpu_to_be32(0), -cpu_to_be32(nb_numa_nodes ? nb_numa_nodes - 1 : 0), +cpu_to_be32(nb_numa_nodes ? nb_numa_nodes : 0), }; _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas")); -- 1.8.3.1
Re: [Qemu-devel] [PATCH for 2.13 v3 1/2] spapr: Add ibm, max-associativity-domains property
David Gibson wrote: > On Tue, Apr 17, 2018 at 09:28:42AM +0530, Bharata B Rao wrote: >> On Mon, Apr 16, 2018 at 07:47:29PM +0300, Serhii Popovych wrote: >>> Bharata B Rao wrote: >>>> On Wed, Apr 11, 2018 at 02:41:59PM -0400, Serhii Popovych wrote: >>>>> Now recent kernels (i.e. since linux-stable commit a346137e9142 >>>>> ("powerpc/numa: Use ibm,max-associativity-domains to discover possible >>>>> nodes") >>>>> support this property to mark initially memory-less NUMA nodes as >>>>> "possible" >>>>> to allow further memory hot-add to them. >>>>> >>>>> Advertise this property for pSeries machines to let guest kernels detect >>>>> maximum supported node configuration and benefit from kernel side change >>>>> when hot-add memory to specific, possibly empty before, NUMA node. >>>>> >>>>> Signed-off-by: Serhii Popovych <spopo...@redhat.com> >>>>> --- >>>>> hw/ppc/spapr.c | 10 ++ >>>>> 1 file changed, 10 insertions(+) >>>>> >>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >>>>> index a81570e..c05bbad 100644 >>>>> --- a/hw/ppc/spapr.c >>>>> +++ b/hw/ppc/spapr.c >>>>> @@ -910,6 +910,13 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, >>>>> void *fdt) >>>>> 0, cpu_to_be32(SPAPR_MEMORY_BLOCK_SIZE), >>>>> cpu_to_be32(max_cpus / smp_threads), >>>>> }; >>>>> +uint32_t maxdomains[] = { >>>>> +cpu_to_be32(4), >>>>> +cpu_to_be32(0), >>>>> +cpu_to_be32(0), >>>>> +cpu_to_be32(0), >>>>> +cpu_to_be32(nb_numa_nodes - 1), >>>>> +}; >>>>> >>>>> _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas")); >>>>> >>>>> @@ -946,6 +953,9 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, >>>>> void *fdt) >>>>> _FDT(fdt_setprop(fdt, rtas, "ibm,associativity-reference-points", >>>>> refpoints, sizeof(refpoints))); >>>>> >>>>> +_FDT(fdt_setprop(fdt, rtas, "ibm,max-associativity-domains", >>>>> + maxdomains, sizeof(maxdomains))); >>>>> + >>>>> _FDT(fdt_setprop_cell(fdt, rtas, "rtas-error-log-max", >>>>>RTAS_ERROR_LOG_MAX)); >>>>> _FDT(fdt_setprop_cell(fdt, rtas, "rtas-event-scan-rate", >>>> >>>> This commit causes hash guest with latest guest kernel to hang at early >>>> boot. >>> >>> I use v4.16 tag from stable and can't reproduce on P8 machine reported >>> issue. >>> >>> Could you please share more details about your setup, kernel commit id >>> you spot problem? >> >> I am on 4.16.0-rc7 (commit id: 0b412605ef5f) >> >> BTW this happens only for non-NUMA guest. > > Ah, that might explain it. With no NUMA nodes specified, I think this > code will put a -1 into the max-associativity-domains property, which > is probably causing the mess. If we don't have NUMA (nb_numa_nodes == > 0) we probably want to either omit the property entirely, or clamp > that 5th cell to 0. > Ok, proposed fix already posted to qemu-devel. Sorry forgot to CC. Mail subject contains (w/o quotas): "spapr: Correct max associativity domains value for non-NUMA configs" Tested with v4.16 tag: o Before : non-NUMA configs: able to reproduce, stall during boot o After : non-NUMA configs: not reproducible, boot is ok -- Thanks, Serhii signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH for 2.13] spapr: Correct max associativity domains value for non-NUMA configs
In non-NUMA configurations nb_numa_nodes is zero and we set 5th cell in ibm,max-associativity-domains to -1. That causes to stall Linux guests during boot after following line: [0.00] NUMA associativity depth for CPU/Memory: 4 Make last possible NUMA in property zero to correct support for non-NUMA guests. Fixes: c1df49a670ef ("spapr: Add ibm,max-associativity-domains property") Signed-off-by: Serhii Popovych <spopo...@redhat.com> --- hw/ppc/spapr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 7b2bc4e..bff2125 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -914,7 +914,7 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void *fdt) cpu_to_be32(0), cpu_to_be32(0), cpu_to_be32(0), -cpu_to_be32(nb_numa_nodes - 1), +cpu_to_be32(nb_numa_nodes ? nb_numa_nodes - 1 : 0), }; _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas")); -- 1.8.3.1
Re: [Qemu-devel] [PATCH for 2.13 v3 1/2] spapr: Add ibm, max-associativity-domains property
Bharata B Rao wrote: > On Wed, Apr 11, 2018 at 02:41:59PM -0400, Serhii Popovych wrote: >> Now recent kernels (i.e. since linux-stable commit a346137e9142 >> ("powerpc/numa: Use ibm,max-associativity-domains to discover possible >> nodes") >> support this property to mark initially memory-less NUMA nodes as "possible" >> to allow further memory hot-add to them. >> >> Advertise this property for pSeries machines to let guest kernels detect >> maximum supported node configuration and benefit from kernel side change >> when hot-add memory to specific, possibly empty before, NUMA node. >> >> Signed-off-by: Serhii Popovych <spopo...@redhat.com> >> --- >> hw/ppc/spapr.c | 10 ++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >> index a81570e..c05bbad 100644 >> --- a/hw/ppc/spapr.c >> +++ b/hw/ppc/spapr.c >> @@ -910,6 +910,13 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, >> void *fdt) >> 0, cpu_to_be32(SPAPR_MEMORY_BLOCK_SIZE), >> cpu_to_be32(max_cpus / smp_threads), >> }; >> +uint32_t maxdomains[] = { >> +cpu_to_be32(4), >> +cpu_to_be32(0), >> +cpu_to_be32(0), >> +cpu_to_be32(0), >> +cpu_to_be32(nb_numa_nodes - 1), >> +}; >> >> _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas")); >> >> @@ -946,6 +953,9 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void >> *fdt) >> _FDT(fdt_setprop(fdt, rtas, "ibm,associativity-reference-points", >> refpoints, sizeof(refpoints))); >> >> +_FDT(fdt_setprop(fdt, rtas, "ibm,max-associativity-domains", >> + maxdomains, sizeof(maxdomains))); >> + >> _FDT(fdt_setprop_cell(fdt, rtas, "rtas-error-log-max", >>RTAS_ERROR_LOG_MAX)); >> _FDT(fdt_setprop_cell(fdt, rtas, "rtas-event-scan-rate", >> -- >> 1.8.3.1 > > This commit causes hash guest with latest guest kernel to hang at early boot. I use v4.16 tag from stable and can't reproduce on P8 machine reported issue. Could you please share more details about your setup, kernel commit id you spot problem? > > Quiescing Open Firmware ... > Booting Linux via __start() @ 0x0200 ... > [0.00] hash-mmu: Page sizes from device-tree: > [0.00] hash-mmu: base_shift=12: shift=12, sllp=0x, > avpnm=0x, tlbiel=1, penc=0 > [0.00] hash-mmu: base_shift=16: shift=16, sllp=0x0110, > avpnm=0x, tlbiel=1, penc=1 > [0.00] Using 1TB segments > [0.00] hash-mmu: Initializing hash mmu with SLB > [0.00] Linux version 4.16.0-rc7+ (root@localhost.localdomain) (gcc > version 7.1.1 20170622 (Red Hat 7.1.1-3) (GCC)) #60 SMP Wed Apr 11 10:36:22 > IST 2018 > [0.00] Found initrd at 0xc3c0:0xc4f9a34c > [0.00] Using pSeries machine description > [0.00] bootconsole [udbg0] enabled > [0.00] Partition configured for 32 cpus. > [0.00] CPU maps initialized for 1 thread per core > [0.00] - > [0.00] ppc64_pft_size= 0x1a > [0.00] phys_mem_size = 0x2 > [0.00] dcache_bsize = 0x80 > [0.00] icache_bsize = 0x80 > [0.00] cpu_features = 0x077c7a6c18500249 > [0.00] possible= 0x18500649 > [0.00] always = 0x18100040 > [0.00] cpu_user_features = 0xdc0065c2 0xae00 > [0.00] mmu_features = 0x78006001 > [0.00] firmware_features = 0x0001415a445f > [0.00] htab_hash_mask= 0x7 > [0.00] - > > No progess after this. > > -- Thanks, Serhii signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH for 2.13 v3 1/2] spapr: Add ibm, max-associativity-domains property
Now recent kernels (i.e. since linux-stable commit a346137e9142 ("powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes") support this property to mark initially memory-less NUMA nodes as "possible" to allow further memory hot-add to them. Advertise this property for pSeries machines to let guest kernels detect maximum supported node configuration and benefit from kernel side change when hot-add memory to specific, possibly empty before, NUMA node. Signed-off-by: Serhii Popovych <spopo...@redhat.com> --- hw/ppc/spapr.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index a81570e..c05bbad 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -910,6 +910,13 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void *fdt) 0, cpu_to_be32(SPAPR_MEMORY_BLOCK_SIZE), cpu_to_be32(max_cpus / smp_threads), }; +uint32_t maxdomains[] = { +cpu_to_be32(4), +cpu_to_be32(0), +cpu_to_be32(0), +cpu_to_be32(0), +cpu_to_be32(nb_numa_nodes - 1), +}; _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas")); @@ -946,6 +953,9 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void *fdt) _FDT(fdt_setprop(fdt, rtas, "ibm,associativity-reference-points", refpoints, sizeof(refpoints))); +_FDT(fdt_setprop(fdt, rtas, "ibm,max-associativity-domains", + maxdomains, sizeof(maxdomains))); + _FDT(fdt_setprop_cell(fdt, rtas, "rtas-error-log-max", RTAS_ERROR_LOG_MAX)); _FDT(fdt_setprop_cell(fdt, rtas, "rtas-event-scan-rate", -- 1.8.3.1
[Qemu-devel] [PATCH for 2.13 v3 0/2] target/ppc: Support adding memory to initially memory-less NUMA nodes
Now PowerPC Linux kernel supports hot-add to NUMA nodes not populated initially with memory we can enable such support in qemu. This requires two changes: o Add device tree property "ibm,max-associativity-domains" to let guest kernel chance to find max possible NUMA node o Revert commit b556854bd852 ("spapr: Don't allow memory hotplug to memory less nodes") to remove check for hot-add to memory-less node. See description messges for individual changes for more details. v3: - Make layer for max_cpus unspecified instead of setting it to zero. Not adding cpu_to_be32(spapr_vcpu_id(spapr, max_cpus - 1)) because at the moment we only want max number for numa nodes to enable feat. - Rebase to current state of master branch. v2: - Reorder patches in series according to description above. - Add extra coment to revert noticing return to previous behaviour for guests without support for hot-add to empty node. - Drop max_cpus from topology in property due to vcpu id discontiguous allocations. Thanks to David Gibson for extra explanation. - Rebase to current state of master branch. Serhii Popovych (2): spapr: Add ibm,max-associativity-domains property Revert "spapr: Don't allow memory hotplug to memory less nodes" hw/ppc/spapr.c | 32 ++-- 1 file changed, 10 insertions(+), 22 deletions(-) -- 1.8.3.1
[Qemu-devel] [PATCH for 2.13 v3 2/2] Revert "spapr: Don't allow memory hotplug to memory less nodes"
This reverts commit b556854bd8524c26b8be98ab1bfdf0826831e793. Leave change @node type from uint32_t to to int from reverted commit because node < 0 is always false. Note that implementing capability or some trick to detect if guest kernel does not support hot-add to memory: this returns previous behavour where memory added to first non-empty node. Signed-off-by: Serhii Popovych <spopo...@redhat.com> --- hw/ppc/spapr.c | 22 -- 1 file changed, 22 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index c05bbad..1e7983c 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -3488,28 +3488,6 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev, return; } -/* - * Currently PowerPC kernel doesn't allow hot-adding memory to - * memory-less node, but instead will silently add the memory - * to the first node that has some memory. This causes two - * unexpected behaviours for the user. - * - * - Memory gets hotplugged to a different node than what the user - * specified. - * - Since pc-dimm subsystem in QEMU still thinks that memory belongs - * to memory-less node, a reboot will set things accordingly - * and the previously hotplugged memory now ends in the right node. - * This appears as if some memory moved from one node to another. - * - * So until kernel starts supporting memory hotplug to memory-less - * nodes, just prevent such attempts upfront in QEMU. - */ -if (nb_numa_nodes && !numa_info[node].node_mem) { -error_setg(errp, "Can't hotplug memory to memory-less node %d", - node); -return; -} - spapr_memory_plug(hotplug_dev, dev, node, errp); } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) { spapr_core_plug(hotplug_dev, dev, errp); -- 1.8.3.1
[Qemu-devel] [PATCH for 2.13 v2 2/2] Revert "spapr: Don't allow memory hotplug to memory less nodes"
This reverts commit b556854bd8524c26b8be98ab1bfdf0826831e793. Leave change @node type from uint32_t to to int from reverted commit because node < 0 is always false. Note that implementing capability or some trick to detect if guest kernel does not support hot-add to memory: this returns previous behavour where memory added to first non-empty node. Signed-off-by: Serhii Popovych <spopo...@redhat.com> --- hw/ppc/spapr.c | 22 -- 1 file changed, 22 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 3f61785..cd7a347 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -3488,28 +3488,6 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev, return; } -/* - * Currently PowerPC kernel doesn't allow hot-adding memory to - * memory-less node, but instead will silently add the memory - * to the first node that has some memory. This causes two - * unexpected behaviours for the user. - * - * - Memory gets hotplugged to a different node than what the user - * specified. - * - Since pc-dimm subsystem in QEMU still thinks that memory belongs - * to memory-less node, a reboot will set things accordingly - * and the previously hotplugged memory now ends in the right node. - * This appears as if some memory moved from one node to another. - * - * So until kernel starts supporting memory hotplug to memory-less - * nodes, just prevent such attempts upfront in QEMU. - */ -if (nb_numa_nodes && !numa_info[node].node_mem) { -error_setg(errp, "Can't hotplug memory to memory-less node %d", - node); -return; -} - spapr_memory_plug(hotplug_dev, dev, node, errp); } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) { spapr_core_plug(hotplug_dev, dev, errp); -- 1.8.3.1
[Qemu-devel] [PATCH for 2.13 v2 0/2] target/ppc: Support adding memory to initially memory-less NUMA nodes
Now PowerPC Linux kernel supports hot-add to NUMA nodes not populated initially with memory we can enable such support in qemu. This requires two changes: o Add device tree property "ibm,max-associativity-domains" to let guest kernel chance to find max possible NUMA node o Revert commit b556854bd852 ("spapr: Don't allow memory hotplug to memory less nodes") to remove check for hot-add to memory-less node. See description messges for individual changes for more details. v2: - Reorder patches in series according to description above. - Add extra coment to revert noticing return to previous behaviour for guests without support for hot-add to empty node. - Drop max_cpus from topology in property due to vcpu id discontiguous allocations. Thanks to David Gibson for extra explanation. - Rebase to current state of master branch. Serhii Popovych (2): spapr: Add ibm,max-associativity-domains property Revert "spapr: Don't allow memory hotplug to memory less nodes" hw/ppc/spapr.c | 33 +++-- 1 file changed, 11 insertions(+), 22 deletions(-) -- 1.8.3.1
[Qemu-devel] [PATCH for 2.13 v2 1/2] spapr: Add ibm, max-associativity-domains property
Now recent kernels (i.e. since linux-stable commit a346137e9142 ("powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes") support this property to mark initially memory-less NUMA nodes as "possible" to allow further memory hot-add to them. Advertise this property for pSeries machines to let guest kernels detect maximum supported node configuration and benefit from kernel side change when hot-add memory to specific, possibly empty before, NUMA node. Signed-off-by: Serhii Popovych <spopo...@redhat.com> --- hw/ppc/spapr.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 2c0be8c..3f61785 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -909,6 +909,14 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void *fdt) 0, cpu_to_be32(SPAPR_MEMORY_BLOCK_SIZE), cpu_to_be32(max_cpus / smp_threads), }; +uint32_t maxdomains[] = { +cpu_to_be32(5), +cpu_to_be32(0), +cpu_to_be32(0), +cpu_to_be32(0), +cpu_to_be32(nb_numa_nodes - 1), +cpu_to_be32(0), +}; _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas")); @@ -945,6 +953,9 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void *fdt) _FDT(fdt_setprop(fdt, rtas, "ibm,associativity-reference-points", refpoints, sizeof(refpoints))); +_FDT(fdt_setprop(fdt, rtas, "ibm,max-associativity-domains", + maxdomains, sizeof(maxdomains))); + _FDT(fdt_setprop_cell(fdt, rtas, "rtas-error-log-max", RTAS_ERROR_LOG_MAX)); _FDT(fdt_setprop_cell(fdt, rtas, "rtas-event-scan-rate", -- 1.8.3.1
Re: [Qemu-devel] [PATCH for 2.13 1/2] Revert "spapr: Don't allow memory hotplug to memory less nodes"
Bharata B Rao wrote: > On Thu, Apr 05, 2018 at 10:35:22AM -0400, Serhii Popovych wrote: >> This reverts commit b556854bd8524c26b8be98ab1bfdf0826831e793. >> >> Leave change @node type from uint32_t to to int from reverted commit >> because node < 0 is always false. >> >> Signed-off-by: Serhii Popovych <spopo...@redhat.com> >> --- >> hw/ppc/spapr.c | 22 -- >> 1 file changed, 22 deletions(-) >> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >> index 2c0be8c..3ad4545 100644 >> --- a/hw/ppc/spapr.c >> +++ b/hw/ppc/spapr.c >> @@ -3477,28 +3477,6 @@ static void spapr_machine_device_plug(HotplugHandler >> *hotplug_dev, >> return; >> } >> >> -/* >> - * Currently PowerPC kernel doesn't allow hot-adding memory to >> - * memory-less node, but instead will silently add the memory >> - * to the first node that has some memory. This causes two >> - * unexpected behaviours for the user. >> - * >> - * - Memory gets hotplugged to a different node than what the user >> - * specified. >> - * - Since pc-dimm subsystem in QEMU still thinks that memory >> belongs >> - * to memory-less node, a reboot will set things accordingly >> - * and the previously hotplugged memory now ends in the right >> node. >> - * This appears as if some memory moved from one node to another. >> - * >> - * So until kernel starts supporting memory hotplug to memory-less >> - * nodes, just prevent such attempts upfront in QEMU. >> - */ >> -if (nb_numa_nodes && !numa_info[node].node_mem) { >> -error_setg(errp, "Can't hotplug memory to memory-less node %d", >> - node); >> -return; >> -} >> - > > If you remove this unconditionally, wouldn't it be a problem in case > of newer QEMU with older guest kernels ? Yes, that definitely would affect guest kernels without such support. We probably need to add some capability to test for guest kernel functionality presence. > > Regards, > Bharata. > > -- Thanks, Serhii signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH for 2.13 0/2] target/ppc: Support adding memory to initially memory-less NUMA nodes
Now PowerPC Linux kernel supports hot-add to NUMA nodes not populated initially with memory we can enable such support in qemu. This requires two changes: o Add device tree property "ibm,max-associativity-domains" to let guest kernel chance to find max possible NUMA node o Revert commit b556854bd852 ("spapr: Don't allow memory hotplug to memory less nodes") to remove check for hot-add to memory-less node. See description messges for individual changes for more details. Serhii Popovych (2): Revert "spapr: Don't allow memory hotplug to memory less nodes" spapr: Add ibm,max-associativity-domains property hw/ppc/spapr.c | 33 +++-- 1 file changed, 11 insertions(+), 22 deletions(-) -- 1.8.3.1
[Qemu-devel] [PATCH for 2.13 2/2] spapr: Add ibm, max-associativity-domains property
Now recent kernels (i.e. since linux-stable commit a346137e9142 ("powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes") support this property to mark initially memory-less NUMA nodes as "possible" to allow further memory hot-add to them. Advertise this property for pSeries machines to let guest kernels detect maximum supported node configuration and benefit from kernel side change when hot-add memory to specific, possibly empty before, NUMA node. Signed-off-by: Serhii Popovych <spopo...@redhat.com> --- hw/ppc/spapr.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 3ad4545..e02fc94 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -909,6 +909,14 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void *fdt) 0, cpu_to_be32(SPAPR_MEMORY_BLOCK_SIZE), cpu_to_be32(max_cpus / smp_threads), }; +uint32_t maxdomains[] = { +cpu_to_be32(5), +cpu_to_be32(0), +cpu_to_be32(0), +cpu_to_be32(0), +cpu_to_be32(nb_numa_nodes - 1), +cpu_to_be32(max_cpus - 1), +}; _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas")); @@ -945,6 +953,9 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void *fdt) _FDT(fdt_setprop(fdt, rtas, "ibm,associativity-reference-points", refpoints, sizeof(refpoints))); +_FDT(fdt_setprop(fdt, rtas, "ibm,max-associativity-domains", + maxdomains, sizeof(maxdomains))); + _FDT(fdt_setprop_cell(fdt, rtas, "rtas-error-log-max", RTAS_ERROR_LOG_MAX)); _FDT(fdt_setprop_cell(fdt, rtas, "rtas-event-scan-rate", -- 1.8.3.1
[Qemu-devel] [PATCH for 2.13 1/2] Revert "spapr: Don't allow memory hotplug to memory less nodes"
This reverts commit b556854bd8524c26b8be98ab1bfdf0826831e793. Leave change @node type from uint32_t to to int from reverted commit because node < 0 is always false. Signed-off-by: Serhii Popovych <spopo...@redhat.com> --- hw/ppc/spapr.c | 22 -- 1 file changed, 22 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 2c0be8c..3ad4545 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -3477,28 +3477,6 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev, return; } -/* - * Currently PowerPC kernel doesn't allow hot-adding memory to - * memory-less node, but instead will silently add the memory - * to the first node that has some memory. This causes two - * unexpected behaviours for the user. - * - * - Memory gets hotplugged to a different node than what the user - * specified. - * - Since pc-dimm subsystem in QEMU still thinks that memory belongs - * to memory-less node, a reboot will set things accordingly - * and the previously hotplugged memory now ends in the right node. - * This appears as if some memory moved from one node to another. - * - * So until kernel starts supporting memory hotplug to memory-less - * nodes, just prevent such attempts upfront in QEMU. - */ -if (nb_numa_nodes && !numa_info[node].node_mem) { -error_setg(errp, "Can't hotplug memory to memory-less node %d", - node); -return; -} - spapr_memory_plug(hotplug_dev, dev, node, errp); } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) { spapr_core_plug(hotplug_dev, dev, errp); -- 1.8.3.1