Dear List/Brice

I experimented with disabling the memory touch on threads except for N=1,2,3,4 
etc and found a problem in hwloc, which is that the function 
hwloc_get_area_memlocation was returning '0' when the status of the memory null 
move operation was -14 (#define EFAULT 14 /* Bad address */). This was when I 
call get area memlocation immediately after allocating and then 'not' touching. 
I think if the status is an error, then the function should probably return -1, 
but anyway. I'll file a bug and send a patch if this is considered to be a bug.

I then modified the test routine to write the value returned from sched_getcpu 
into the touched memory location to verify that the thread binding was doing 
the right thing. The output below from the AMD 8 numanode machine looks good 
with threads 0,8,16 etc each touching memory which follows the pattern expected 
from the 8 numanode test. my get numa domain function however, does not reflect 
the right numanode. It looks correct for the first column (matrices are stored 
in column major order), but after that it falls to pieces. In this test, I'm 
allocating tiles as 512x512 doubles, so 4096 bytes per tile giving one tile 
column per page and I do 512 pages per tile. All the memory locations check out 
and the patters seem fine, but the call to 
        // edited version of the one in hwloc source
        syscall(__NR_move_pages, 0, 1, pages, nullptr, status, 0) == 0) 
is not returning the numanode that I expect to see from the first touch when it 
is enabled.

Either the syscall is wrong, or the first touch/nexttouch doesn't work (could 
the alloc routine be wrong?)
            hwloc_alloc_membind(topo, len, bitmap->get_bmp(),
                (hwloc_membind_policy_t)(policy),
                flags | HWLOC_MEMBIND_BYNODESET);
where the nodeset should match the numanode mask (I'd will double check that 
right now).

Any ideas on what to try next?

Thanks

JB

get_numa_domain() 8 Domain Numa pattern
00740640
10740640
20740640
30740640
40740640
50740640
60740640
70740640
============================

============================
Contents of memory locations = sched_getcpu()
0 8 16 24 32 40 48 56 
8 16 24 32 40 48 56 0 
16 24 32 40 48 56 0 8 
24 32 40 48 56 0 8 16 
32 40 48 56 0 8 16 24 
40 48 56 0 8 16 24 32 
48 56 0 8 16 24 32 40 
56 0 8 16 24 32 40 48 
============================

============================
Expected 8 Domain Numa pattern
01234567
12345670
23456701
34567012
45670123
56701234
67012345
70123456
============================
_______________________________________________
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Reply via email to