The answer is "no", I don't have root access, but I suspect that that would be 
the right fix if it is currently set to [always] and either madvise or never 
would be good options. If it is of interest, I'll ask someone to try it and 
report back on what happens.

-----Original Message-----
From: Brice Goglin <brice.gog...@inria.fr> 
Sent: 29 January 2019 15:39
To: Biddiscombe, John A. <biddi...@cscs.ch>; Hardware locality user list 
<hwloc-users@lists.open-mpi.org>
Subject: Re: [hwloc-users] unusual memory binding results

Only the one in brackets is set, others are unset alternatives.

If you write "madvise" in that file, it'll become "always [madvise] never".

Brice


Le 29/01/2019 à 15:36, Biddiscombe, John A. a écrit :
> On the 8 numa node machine
>
> $cat /sys/kernel/mm/transparent_hugepage/enabled
> [always] madvise never
>
> is set already, so I'm not really sure what should go in there to disable it.
>
> JB
>
> -----Original Message-----
> From: Brice Goglin <brice.gog...@inria.fr>
> Sent: 29 January 2019 15:29
> To: Biddiscombe, John A. <biddi...@cscs.ch>; Hardware locality user 
> list <hwloc-users@lists.open-mpi.org>
> Subject: Re: [hwloc-users] unusual memory binding results
>
> Oh, that's very good to know. I guess lots of people using first touch will 
> be affected by this issue. We may want to add a hwloc memory flag doing 
> something similar.
>
> Do you have root access to verify that writing "never" or "madvise" in 
> /sys/kernel/mm/transparent_hugepage/enabled fixes the issue too?
>
> Brice
>
>
>
> Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit :
>> Brice
>>
>> madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE)
>>
>> seems to make things behave much more sensibly. I had no idea it was a 
>> thing, but one of my colleagues pointed me to it.
>>
>> Problem seems to be solved for now. Thank you very much for your insights 
>> and suggestions/help.
>>
>> JB
>>
>> -----Original Message-----
>> From: Brice Goglin <brice.gog...@inria.fr>
>> Sent: 29 January 2019 10:35
>> To: Biddiscombe, John A. <biddi...@cscs.ch>; Hardware locality user 
>> list <hwloc-users@lists.open-mpi.org>
>> Subject: Re: [hwloc-users] unusual memory binding results
>>
>> Crazy idea: 512 pages could be replaced with a single 2MB huge page.
>> You're not requesting huge pages in your allocation but some systems 
>> have transparent huge pages enabled by default (e.g. RHEL
>> https://access.redhat.com/solutions/46111)
>>
>> This could explain why 512 pages get allocated on the same node, but it 
>> wouldn't explain crazy patterns you've seen in the past.
>>
>> Brice
>>
>>
>>
>>
>> Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit :
>>> I simplified things and instead of writing to a 2D array, I allocate a 1D 
>>> array of bytes and touch pages in a linear fashion.
>>> Then I call syscall(NR)move_pages, ....) and retrieve a status array for 
>>> each page in the data.
>>>
>>> When I allocate 511 pages and touch alternate pages on alternate 
>>> numa nodes
>>>
>>> Numa page binding 511
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 
>>> 0
>>> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 
>>> 1
>>> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
>>>
>>> but as soon as I increase to 512 pages, it breaks.
>>>
>>> Numa page binding 512
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
>>> 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>>
>>> On the 8 numa node machine it sometimes gives the right answer even with 
>>> 512 pages.
>>>
>>> Still baffled
>>>
>>> JB
>>>
>>> -----Original Message-----
>>> From: hwloc-users <hwloc-users-boun...@lists.open-mpi.org> On Behalf Of 
>>> Biddiscombe, John A.
>>> Sent: 28 January 2019 16:14
>>> To: Brice Goglin <brice.gog...@inria.fr>
>>> Cc: Hardware locality user list <hwloc-users@lists.open-mpi.org>
>>> Subject: Re: [hwloc-users] unusual memory binding results
>>>
>>> Brice
>>>
>>>> Can you print the pattern before and after thread 1 touched its pages, or 
>>>> even in the middle ?
>>>> It looks like somebody is touching too many pages here.
>>> Experimenting with different threads touching one or more pages, I 
>>> get unpredicatable results
>>>
>>> here on the 8 numa node device, the result is perfect. I am only 
>>> allowing thread 3 and 7 to write a single memory location
>>>
>>> get_numa_domain() 8 Domain Numa pattern
>>> --------
>>> --------
>>> --------
>>> 3-------
>>> --------
>>> --------
>>> --------
>>> 7-------
>>> ============================
>>>
>>> ============================
>>> Contents of memory locations
>>> 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0
>>> 26 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0
>>> 63 0 0 0 0 0 0 0
>>> ============================
>>>
>>> you can see that core 26 (numa domain 3) wrote to memory, and so did 
>>> core 63 (domain 8)
>>>
>>> Now I run it a second time and look, its rubbish
>>>
>>> get_numa_domain() 8 Domain Numa pattern
>>> 3-------
>>> 3-------
>>> 3-------
>>> 3-------
>>> 3-------
>>> 3-------
>>> 3-------
>>> 3-------
>>> ============================
>>>
>>> ============================
>>> Contents of memory locations
>>> 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0
>>> 26 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0
>>> 63 0 0 0 0 0 0 0
>>> ============================
>>>
>>> after allowing the data to be read by a random thread
>>>
>>> 37777777
>>> 37777777
>>> 37777777
>>> 37777777
>>> 37777777
>>> 37777777
>>> 37777777
>>> 37777777
>>>
>>> I'm baffled.
>>>
>>> JB
>>>
>>> _______________________________________________
>>> hwloc-users mailing list
>>> hwloc-users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
_______________________________________________
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Reply via email to