On the 8 numa node machine

$cat /sys/kernel/mm/transparent_hugepage/enabled 
[always] madvise never

is set already, so I'm not really sure what should go in there to disable it.

JB

-----Original Message-----
From: Brice Goglin <brice.gog...@inria.fr> 
Sent: 29 January 2019 15:29
To: Biddiscombe, John A. <biddi...@cscs.ch>; Hardware locality user list 
<hwloc-users@lists.open-mpi.org>
Subject: Re: [hwloc-users] unusual memory binding results

Oh, that's very good to know. I guess lots of people using first touch will be 
affected by this issue. We may want to add a hwloc memory flag doing something 
similar.

Do you have root access to verify that writing "never" or "madvise" in 
/sys/kernel/mm/transparent_hugepage/enabled fixes the issue too?

Brice



Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit :
> Brice
>
> madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE)
>
> seems to make things behave much more sensibly. I had no idea it was a thing, 
> but one of my colleagues pointed me to it.
>
> Problem seems to be solved for now. Thank you very much for your insights and 
> suggestions/help.
>
> JB
>
> -----Original Message-----
> From: Brice Goglin <brice.gog...@inria.fr>
> Sent: 29 January 2019 10:35
> To: Biddiscombe, John A. <biddi...@cscs.ch>; Hardware locality user 
> list <hwloc-users@lists.open-mpi.org>
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Crazy idea: 512 pages could be replaced with a single 2MB huge page.
> You're not requesting huge pages in your allocation but some systems 
> have transparent huge pages enabled by default (e.g. RHEL
> https://access.redhat.com/solutions/46111)
>
> This could explain why 512 pages get allocated on the same node, but it 
> wouldn't explain crazy patterns you've seen in the past.
>
> Brice
>
>
>
>
> Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit :
>> I simplified things and instead of writing to a 2D array, I allocate a 1D 
>> array of bytes and touch pages in a linear fashion.
>> Then I call syscall(NR)move_pages, ....) and retrieve a status array for 
>> each page in the data.
>>
>> When I allocate 511 pages and touch alternate pages on alternate numa 
>> nodes
>>
>> Numa page binding 511
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>
>> but as soon as I increase to 512 pages, it breaks.
>>
>> Numa page binding 512
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>
>> On the 8 numa node machine it sometimes gives the right answer even with 512 
>> pages.
>>
>> Still baffled
>>
>> JB
>>
>> -----Original Message-----
>> From: hwloc-users <hwloc-users-boun...@lists.open-mpi.org> On Behalf Of 
>> Biddiscombe, John A.
>> Sent: 28 January 2019 16:14
>> To: Brice Goglin <brice.gog...@inria.fr>
>> Cc: Hardware locality user list <hwloc-users@lists.open-mpi.org>
>> Subject: Re: [hwloc-users] unusual memory binding results
>>
>> Brice
>>
>>> Can you print the pattern before and after thread 1 touched its pages, or 
>>> even in the middle ?
>>> It looks like somebody is touching too many pages here.
>> Experimenting with different threads touching one or more pages, I 
>> get unpredicatable results
>>
>> here on the 8 numa node device, the result is perfect. I am only 
>> allowing thread 3 and 7 to write a single memory location
>>
>> get_numa_domain() 8 Domain Numa pattern
>> --------
>> --------
>> --------
>> 3-------
>> --------
>> --------
>> --------
>> 7-------
>> ============================
>>
>> ============================
>> Contents of memory locations
>> 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0
>> 26 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0
>> 63 0 0 0 0 0 0 0
>> ============================
>>
>> you can see that core 26 (numa domain 3) wrote to memory, and so did 
>> core 63 (domain 8)
>>
>> Now I run it a second time and look, its rubbish
>>
>> get_numa_domain() 8 Domain Numa pattern
>> 3-------
>> 3-------
>> 3-------
>> 3-------
>> 3-------
>> 3-------
>> 3-------
>> 3-------
>> ============================
>>
>> ============================
>> Contents of memory locations
>> 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0
>> 26 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0
>> 63 0 0 0 0 0 0 0
>> ============================
>>
>> after allowing the data to be read by a random thread
>>
>> 37777777
>> 37777777
>> 37777777
>> 37777777
>> 37777777
>> 37777777
>> 37777777
>> 37777777
>>
>> I'm baffled.
>>
>> JB
>>
>> _______________________________________________
>> hwloc-users mailing list
>> hwloc-users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
_______________________________________________
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Reply via email to