On Fri, Sep 10, 2021 at 5:46 AM Jia He <justin...@arm.com> wrote: > > Previously, numa_off was set unconditionally in dummy_numa_init() > even with a fake numa node. Then ACPI sets node id as NUMA_NO_NODE(-1) > after acpi_map_pxm_to_node() because it regards numa_off as turning > off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on > arm64 with fake numa case. > > Without this patch, pmem can't be probed as RAM devices on arm64 if > SRAT table isn't present: > $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a > 64K > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with > invalid node: -1 > kmem: probe of dax0.0 failed with error -22 > > This fixes it by using fallback memory_add_physaddr_to_nid() as nid. > > Suggested-by: David Hildenbrand <da...@redhat.com> > Signed-off-by: Jia He <justin...@arm.com> > --- > v2: - rebase it based on David's "memory group" patch. > - drop the changes in dev_dax_kmem_remove() since nid had been > removed in remove_memory(). > drivers/dax/kmem.c | 31 +++++++++++++++++-------------- > 1 file changed, 17 insertions(+), 14 deletions(-) > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > index a37622060fff..e4836eb7539e 100644 > --- a/drivers/dax/kmem.c > +++ b/drivers/dax/kmem.c > @@ -47,20 +47,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) > unsigned long total_len = 0; > struct dax_kmem_data *data; > int i, rc, mapped = 0; > - int numa_node; > - > - /* > - * Ensure good NUMA information for the persistent memory. > - * Without this check, there is a risk that slow memory > - * could be mixed in a node with faster memory, causing > - * unavoidable performance issues. > - */ > - numa_node = dev_dax->target_node; > - if (numa_node < 0) { > - dev_warn(dev, "rejecting DAX region with invalid node: %d\n", > - numa_node); > - return -EINVAL; > - } > + int numa_node = dev_dax->target_node; > > for (i = 0; i < dev_dax->nr_range; i++) { > struct range range; > @@ -71,6 +58,22 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) > i, range.start, range.end); > continue; > } > + > + /* > + * Ensure good NUMA information for the persistent memory. > + * Without this check, there is a risk but not fatal that slow > + * memory could be mixed in a node with faster memory, causing > + * unavoidable performance issues. Warn this and use fallback > + * node id. > + */ > + if (numa_node < 0) { > + int new_node = > memory_add_physaddr_to_nid(range.start); > + > + dev_info(dev, "changing nid from %d to %d for DAX > region [%#llx-%#llx]\n", > + numa_node, new_node, range.start, range.end); > + numa_node = new_node; > + } > + > total_len += range_len(&range);
This fallback change belongs where the parent region for the namespace adopts its target_node, because it's not clear memory_add_physaddr_to_nid() is the right fallback in all situations. Here is where this setting is happening currently: drivers/acpi/nfit/core.c:3004: ndr_desc->target_node = pxm_to_node(spa->proximity_domain); drivers/acpi/nfit/core.c:3007: ndr_desc->target_node = NUMA_NO_NODE; drivers/nvdimm/e820.c:29: ndr_desc.target_node = nid; drivers/nvdimm/of_pmem.c:58: ndr_desc.target_node = ndr_desc.numa_node; drivers/nvdimm/region_devs.c:1127: nd_region->target_node = ndr_desc->target_node; ...where is this pmem region originating on this arm64 platform?