Brice I might have been using the wrong params to hwloc_get_area_memlocation in my original version, but I bypassed it and have been calling
int get_numa_domain(void *page) { HPX_ASSERT( (std::size_t(page) & 4095) ==0 ); void *pages[1] = { page }; int status[1] = { -1 }; if (syscall(__NR_move_pages, 0, 1, pages, nullptr, status, 0) == 0) { if (status[0]>=0 && status[0]<=HPX_HAVE_MAX_NUMA_DOMAIN_COUNT) { return status[0]; } return -1; } throw std::runtime_error("Failed to get numa node for page"); } this function instead. Just testing one page address at a time. I still see this kind of pattern 00101101010111101010100101010101101001101101010111010111011101010100000101010000 00101101010111101010100101010101101001101101010111010111011101010100000101010000 00101101010111101010100101010101101001101101010111010111011101010100000101010000 00101101010111101010100101010101101001101101010111010111011101010100000101010000 00101101010111101010100101010101101001101101010111010111011101010100000101010000 00101101010111101010100101010101101001101101010111010111011101010100000101010000 00101101010111101010100101010101101001101101010111010111011101010100000101010000 00101101010111101010100101010101101001101101010111010111011101010100000101010000 00101101010111101010100101010101101001101101010111010111011101010100000101010000 00101101010111101010100101010101101001101101010111010111011101010100000101010000 00101101010111101010100101010101101001101101010111010111011101010100000101010000 when I should see 01010101010101010101010101010101010101010101010101010101010101010101010101010101 10101010101010101010101010101010101010101010101010101010101010101010101010101010 01010101010101010101010101010101010101010101010101010101010101010101010101010101 10101010101010101010101010101010101010101010101010101010101010101010101010101010 01010101010101010101010101010101010101010101010101010101010101010101010101010101 10101010101010101010101010101010101010101010101010101010101010101010101010101010 01010101010101010101010101010101010101010101010101010101010101010101010101010101 10101010101010101010101010101010101010101010101010101010101010101010101010101010 01010101010101010101010101010101010101010101010101010101010101010101010101010101 10101010101010101010101010101010101010101010101010101010101010101010101010101010 I am deeply troubled by this and can't think of what to try next since I can see the memory contents hold the correct CPU ID of the thread that touched the memory, so either the syscall is wrong, or the kernel is doing something else. I welcome any suggestions on what might be wrong. Thanks for trying to help. JB -----Original Message----- From: Brice Goglin <brice.gog...@inria.fr> Sent: 26 January 2019 10:19 To: Biddiscombe, John A. <biddi...@cscs.ch> Cc: Hardware locality user list <hwloc-users@lists.open-mpi.org> Subject: Re: [hwloc-users] unusual memory binding results Le 25/01/2019 à 23:16, Biddiscombe, John A. a écrit : >> move_pages() returning 0 with -14 in the status array? As opposed to >> move_pages() returning -1 with errno set to 14, which would definitely be a >> bug in hwloc. > I think it was move_pages returning zero with -14 in the status array, and > then hwloc returning 0 with an empty nodeset (which I then messed up by > calling get bitmap first and assuming 0 meant numa node zero and not checking > for an empty nodeset). > > I'm not sure why I get -EFAULT status rather than -NOENT, but that's what I'm > seeing in the status field when I pass the pointer returned from the > alloc_membind call. The only reason I see for getting -EFAULT there would be that you pass the buffer to move_pages (what hwloc_get_area_memlocation() wants, a start pointer and length) instead of a pointer to an array of page addresses (move_pages wants a void** pointing to individual pages). Brice _______________________________________________ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users