On Fri, Jun 2, 2017 at 3:14 PM, Michael Ellerman <m...@ellerman.id.au> wrote: > In commit 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we > switched to the generic implementation of cpu_to_node(), which uses a percpu > variable to hold the NUMA node for each CPU. > > Unfortunately we neglected to notice that we use cpu_to_node() in the > allocation > of our percpu areas, leading to a chicken and egg problem. In practice what > happens is when we are setting up the percpu areas, cpu_to_node() reports that > all CPUs are on node 0, so we allocate all percpu areas on node 0. > > This is visible in the dmesg output, as all pcpu allocs being in group 0: > > pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07 > pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15 > pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23 > pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31 > pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39 > pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47 > > To fix it we need an early_cpu_to_node() which can run prior to percpu being > setup. We already have the numa_cpu_lookup_table we can use, so just plumb it > in. With the patch dmesg output shows two groups, 0 and 1: > > pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07 > pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15 > pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23 > pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31 > pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39 > pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47 > > We can also check the data_offset in the paca of various CPUs, with the fix we > see: > > CPU 0: data_offset = 0x0ffe8b0000 > CPU 24: data_offset = 0x1ffe5b0000 > > And we can see from dmesg that CPU 24 has an allocation on node 1: > > node 0: [mem 0x0000000000000000-0x0000000fffffffff] > node 1: [mem 0x0000001000000000-0x0000001fffffffff] > > Cc: sta...@vger.kernel.org # v3.16+ > Fixes: 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID") > Signed-off-by: Michael Ellerman <m...@ellerman.id.au> > --- > arch/powerpc/include/asm/topology.h | 14 ++++++++++++++ > arch/powerpc/kernel/setup_64.c | 19 ++++++++++++++----- > 2 files changed, 28 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/include/asm/topology.h > b/arch/powerpc/include/asm/topology.h > index 8b3b46b7b0f2..8f3b2ec09b9e 100644 > --- a/arch/powerpc/include/asm/topology.h > +++ b/arch/powerpc/include/asm/topology.h > @@ -44,8 +44,22 @@ extern void __init dump_numa_cpu_topology(void); > extern int sysfs_add_device_to_node(struct device *dev, int nid); > extern void sysfs_remove_device_from_node(struct device *dev, int nid); > > +static inline int early_cpu_to_node(int cpu) > +{ > + int nid; > + > + nid = numa_cpu_lookup_table[cpu]; > + > + /* > + * Some functions, eg. node_distance() don't cope with -1, so instead > + * fall back to node 0 if nid is unset (it should be, except bugs). > + */ > + return (nid < 0) ? 0 : nid; > +} > #else
Not sure if its entirely related, but I had tried to do https://patchwork.ozlabs.org/patch/683556/ to setup the mapping earlier, but we would have still missed the pcpu_fc_alloc. Balbir