Dear List/Brice I experimented with disabling the memory touch on threads except for N=1,2,3,4 etc and found a problem in hwloc, which is that the function hwloc_get_area_memlocation was returning '0' when the status of the memory null move operation was -14 (#define EFAULT 14 /* Bad address */). This was when I call get area memlocation immediately after allocating and then 'not' touching. I think if the status is an error, then the function should probably return -1, but anyway. I'll file a bug and send a patch if this is considered to be a bug.
I then modified the test routine to write the value returned from sched_getcpu into the touched memory location to verify that the thread binding was doing the right thing. The output below from the AMD 8 numanode machine looks good with threads 0,8,16 etc each touching memory which follows the pattern expected from the 8 numanode test. my get numa domain function however, does not reflect the right numanode. It looks correct for the first column (matrices are stored in column major order), but after that it falls to pieces. In this test, I'm allocating tiles as 512x512 doubles, so 4096 bytes per tile giving one tile column per page and I do 512 pages per tile. All the memory locations check out and the patters seem fine, but the call to // edited version of the one in hwloc source syscall(__NR_move_pages, 0, 1, pages, nullptr, status, 0) == 0) is not returning the numanode that I expect to see from the first touch when it is enabled. Either the syscall is wrong, or the first touch/nexttouch doesn't work (could the alloc routine be wrong?) hwloc_alloc_membind(topo, len, bitmap->get_bmp(), (hwloc_membind_policy_t)(policy), flags | HWLOC_MEMBIND_BYNODESET); where the nodeset should match the numanode mask (I'd will double check that right now). Any ideas on what to try next? Thanks JB get_numa_domain() 8 Domain Numa pattern 00740640 10740640 20740640 30740640 40740640 50740640 60740640 70740640 ============================ ============================ Contents of memory locations = sched_getcpu() 0 8 16 24 32 40 48 56 8 16 24 32 40 48 56 0 16 24 32 40 48 56 0 8 24 32 40 48 56 0 8 16 32 40 48 56 0 8 16 24 40 48 56 0 8 16 24 32 48 56 0 8 16 24 32 40 56 0 8 16 24 32 40 48 ============================ ============================ Expected 8 Domain Numa pattern 01234567 12345670 23456701 34567012 45670123 56701234 67012345 70123456 ============================ _______________________________________________ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users