As per PAPR, there are 2 device tree property ibm,max-associativity-domains (which defines the maximum number of domains that the firmware i.e PowerVM can support) and ibm,current-associativity-domains (which defines the maximum number of domains that the platform can support). Value of ibm,max-associativity-domains property is always greater than or equal to ibm,current-associativity-domains property.
Powerpc currently uses ibm,max-associativity-domains property while setting the possible number of nodes. This is currently set at 32. However the possible number of nodes for a platform may be significantly less. Hence set the possible number of nodes based on ibm,current-associativity-domains property. $ lsprop /proc/device-tree/rtas/ibm,*associ*-domains /proc/device-tree/rtas/ibm,current-associativity-domains 00000005 00000001 00000002 00000002 00000002 00000010 /proc/device-tree/rtas/ibm,max-associativity-domains 00000005 00000001 00000008 00000020 00000020 00000100 $ cat /sys/devices/system/node/possible ##Before patch 0-31 $ cat /sys/devices/system/node/possible ##After patch 0-1 Note the maximum nodes this platform can support is only 2 but the possible nodes is set to 32. This is important because lot of kernel and user space code allocate structures for all possible nodes leading to a lot of memory that is allocated but not used. I ran a simple experiment to create and destroy 100 memory cgroups on boot on a 8 node machine (Power8 Alpine). Before patch free -k at boot total used free shared buff/cache available Mem: 523498176 4106816 518820608 22272 570752 516606720 Swap: 4194240 0 4194240 free -k after creating 100 memory cgroups total used free shared buff/cache available Mem: 523498176 4628416 518246464 22336 623296 516058688 Swap: 4194240 0 4194240 free -k after destroying 100 memory cgroups total used free shared buff/cache available Mem: 523498176 4697408 518173760 22400 627008 515987904 Swap: 4194240 0 4194240 After patch free -k at boot total used free shared buff/cache available Mem: 523498176 3969472 518933888 22272 594816 516731776 Swap: 4194240 0 4194240 free -k after creating 100 memory cgroups total used free shared buff/cache available Mem: 523498176 4181888 518676096 22208 640192 516496448 Swap: 4194240 0 4194240 free -k after destroying 100 memory cgroups total used free shared buff/cache available Mem: 523498176 4232320 518619904 22272 645952 516443264 Swap: 4194240 0 4194240 Observations: Fixed kernel takes 137344 kb (4106816-3969472) less to boot. Fixed kernel takes 309184 kb (4628416-4181888-137344) less to create 100 memcgs. Cc: Nathan Lynch <nath...@linux.ibm.com> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Cc: Anton Blanchard <an...@ozlabs.org> Cc: Bharata B Rao <bhar...@linux.ibm.com> Cc: Tyrel Datwyler <tyr...@linux.ibm.com> Signed-off-by: Srikar Dronamraju <sri...@linux.vnet.ibm.com> --- v1: https://lore.kernel.org/linuxppc-dev/20200706064002.14848-1-sri...@linux.vnet.ibm.com/t/#u Changelog v1->v2: Fallback to ibm,max-associativity-domains if ibm,current-associativity is not available. Suggested by Tyrel Datwyler. arch/powerpc/mm/numa.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 9fcf2d195830..fc7b0505bdd8 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -896,10 +896,19 @@ static void __init find_possible_nodes(void) if (!rtas) return; - if (of_property_read_u32_index(rtas, - "ibm,max-associativity-domains", + if (of_property_read_u32_index(rtas, "ibm,current-associativity-domains", + min_common_depth, &numnodes)) { + /* + * ibm,current-associativity-domains is a fairly recent + * property. If it doesn't exist, then fallback on + * ibm,max-associativity-domains. Current denotes what the + * platform can support compared to max which denotes what the + * Hypervisor can support. + */ + if (of_property_read_u32_index(rtas, "ibm,max-associativity-domains", min_common_depth, &numnodes)) - goto out; + goto out; + } for (i = 0; i < numnodes; i++) { if (!node_possible(i)) -- 2.18.2