Brice
I might have been using the wrong params to hwloc_get_area_memlocation in my
original version, but I bypassed it and have been calling
int get_numa_domain(void *page)
{
HPX_ASSERT( (std::size_t(page) & 4095) ==0 );
void *pages[1] = { page };
int status[1] = { -1 };
if (syscall(__NR_move_pages, 0, 1, pages, nullptr, status, 0) == 0)
{
if (status[0]>=0 && status[0]<=HPX_HAVE_MAX_NUMA_DOMAIN_COUNT) {
return status[0];
}
return -1;
}
throw std::runtime_error("Failed to get numa node for page");
}
this function instead. Just testing one page address at a time. I still see
this kind of pattern
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
when I should see
01010101010101010101010101010101010101010101010101010101010101010101010101010101
10101010101010101010101010101010101010101010101010101010101010101010101010101010
01010101010101010101010101010101010101010101010101010101010101010101010101010101
10101010101010101010101010101010101010101010101010101010101010101010101010101010
01010101010101010101010101010101010101010101010101010101010101010101010101010101
10101010101010101010101010101010101010101010101010101010101010101010101010101010
01010101010101010101010101010101010101010101010101010101010101010101010101010101
10101010101010101010101010101010101010101010101010101010101010101010101010101010
01010101010101010101010101010101010101010101010101010101010101010101010101010101
10101010101010101010101010101010101010101010101010101010101010101010101010101010
I am deeply troubled by this and can't think of what to try next since I can
see the memory contents hold the correct CPU ID of the thread that touched the
memory, so either the syscall is wrong, or the kernel is doing something else.
I welcome any suggestions on what might be wrong.
Thanks for trying to help.
JB
-----Original Message-----
From: Brice Goglin <[email protected]>
Sent: 26 January 2019 10:19
To: Biddiscombe, John A. <[email protected]>
Cc: Hardware locality user list <[email protected]>
Subject: Re: [hwloc-users] unusual memory binding results
Le 25/01/2019 à 23:16, Biddiscombe, John A. a écrit :
>> move_pages() returning 0 with -14 in the status array? As opposed to
>> move_pages() returning -1 with errno set to 14, which would definitely be a
>> bug in hwloc.
> I think it was move_pages returning zero with -14 in the status array, and
> then hwloc returning 0 with an empty nodeset (which I then messed up by
> calling get bitmap first and assuming 0 meant numa node zero and not checking
> for an empty nodeset).
>
> I'm not sure why I get -EFAULT status rather than -NOENT, but that's what I'm
> seeing in the status field when I pass the pointer returned from the
> alloc_membind call.
The only reason I see for getting -EFAULT there would be that you pass the
buffer to move_pages (what hwloc_get_area_memlocation() wants, a start pointer
and length) instead of a pointer to an array of page addresses (move_pages
wants a void** pointing to individual pages).
Brice
_______________________________________________
hwloc-users mailing list
[email protected]
https://lists.open-mpi.org/mailman/listinfo/hwloc-users