Brice

I might have been using the wrong params to hwloc_get_area_memlocation in my 
original version, but I bypassed it and have been calling

        int get_numa_domain(void *page)
        {
            HPX_ASSERT( (std::size_t(page) & 4095) ==0 );

            void *pages[1] = { page };
            int  status[1] = { -1 };
            if (syscall(__NR_move_pages, 0, 1, pages, nullptr, status, 0) == 0) 
{
                if (status[0]>=0 && status[0]<=HPX_HAVE_MAX_NUMA_DOMAIN_COUNT) {
                    return status[0];
                }
                return -1;
            }
            throw std::runtime_error("Failed to get numa node for page");
        }

this function instead. Just testing one page address at a time. I still see 
this kind of pattern
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
00101101010111101010100101010101101001101101010111010111011101010100000101010000
when I should see
01010101010101010101010101010101010101010101010101010101010101010101010101010101
10101010101010101010101010101010101010101010101010101010101010101010101010101010
01010101010101010101010101010101010101010101010101010101010101010101010101010101
10101010101010101010101010101010101010101010101010101010101010101010101010101010
01010101010101010101010101010101010101010101010101010101010101010101010101010101
10101010101010101010101010101010101010101010101010101010101010101010101010101010
01010101010101010101010101010101010101010101010101010101010101010101010101010101
10101010101010101010101010101010101010101010101010101010101010101010101010101010
01010101010101010101010101010101010101010101010101010101010101010101010101010101
10101010101010101010101010101010101010101010101010101010101010101010101010101010

I am deeply troubled by this and can't think of what to try next since I can 
see the memory contents hold the correct CPU ID of the thread that touched the 
memory, so either the syscall is wrong, or the kernel is doing something else. 
I welcome any suggestions on what might be wrong.

Thanks for trying to help.

JB

-----Original Message-----
From: Brice Goglin <brice.gog...@inria.fr> 
Sent: 26 January 2019 10:19
To: Biddiscombe, John A. <biddi...@cscs.ch>
Cc: Hardware locality user list <hwloc-users@lists.open-mpi.org>
Subject: Re: [hwloc-users] unusual memory binding results

Le 25/01/2019 à 23:16, Biddiscombe, John A. a écrit :
>> move_pages() returning 0 with -14 in the status array? As opposed to 
>> move_pages() returning -1 with errno set to 14, which would definitely be a 
>> bug in hwloc.
> I think it was move_pages returning zero with -14 in the status array, and 
> then hwloc returning 0 with an empty nodeset (which I then messed up by 
> calling get bitmap first and assuming 0 meant numa node zero and not checking 
> for an empty nodeset).
>
> I'm not sure why I get -EFAULT status rather than -NOENT, but that's what I'm 
> seeing in the status field when I pass the pointer returned from the 
> alloc_membind call.

The only reason I see for getting -EFAULT there would be that you pass the 
buffer to move_pages (what hwloc_get_area_memlocation() wants, a start pointer 
and length) instead of a pointer to an array of page addresses (move_pages 
wants a void** pointing to individual pages).

Brice


_______________________________________________
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Reply via email to