On the 8 numa node machine $cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never
is set already, so I'm not really sure what should go in there to disable it. JB -----Original Message----- From: Brice Goglin <brice.gog...@inria.fr> Sent: 29 January 2019 15:29 To: Biddiscombe, John A. <biddi...@cscs.ch>; Hardware locality user list <hwloc-users@lists.open-mpi.org> Subject: Re: [hwloc-users] unusual memory binding results Oh, that's very good to know. I guess lots of people using first touch will be affected by this issue. We may want to add a hwloc memory flag doing something similar. Do you have root access to verify that writing "never" or "madvise" in /sys/kernel/mm/transparent_hugepage/enabled fixes the issue too? Brice Le 29/01/2019 à 14:02, Biddiscombe, John A. a écrit : > Brice > > madvise(addr, n * sizeof(T), MADV_NOHUGEPAGE) > > seems to make things behave much more sensibly. I had no idea it was a thing, > but one of my colleagues pointed me to it. > > Problem seems to be solved for now. Thank you very much for your insights and > suggestions/help. > > JB > > -----Original Message----- > From: Brice Goglin <brice.gog...@inria.fr> > Sent: 29 January 2019 10:35 > To: Biddiscombe, John A. <biddi...@cscs.ch>; Hardware locality user > list <hwloc-users@lists.open-mpi.org> > Subject: Re: [hwloc-users] unusual memory binding results > > Crazy idea: 512 pages could be replaced with a single 2MB huge page. > You're not requesting huge pages in your allocation but some systems > have transparent huge pages enabled by default (e.g. RHEL > https://access.redhat.com/solutions/46111) > > This could explain why 512 pages get allocated on the same node, but it > wouldn't explain crazy patterns you've seen in the past. > > Brice > > > > > Le 29/01/2019 à 10:23, Biddiscombe, John A. a écrit : >> I simplified things and instead of writing to a 2D array, I allocate a 1D >> array of bytes and touch pages in a linear fashion. >> Then I call syscall(NR)move_pages, ....) and retrieve a status array for >> each page in the data. >> >> When I allocate 511 pages and touch alternate pages on alternate numa >> nodes >> >> Numa page binding 511 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 >> 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 >> >> but as soon as I increase to 512 pages, it breaks. >> >> Numa page binding 512 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> On the 8 numa node machine it sometimes gives the right answer even with 512 >> pages. >> >> Still baffled >> >> JB >> >> -----Original Message----- >> From: hwloc-users <hwloc-users-boun...@lists.open-mpi.org> On Behalf Of >> Biddiscombe, John A. >> Sent: 28 January 2019 16:14 >> To: Brice Goglin <brice.gog...@inria.fr> >> Cc: Hardware locality user list <hwloc-users@lists.open-mpi.org> >> Subject: Re: [hwloc-users] unusual memory binding results >> >> Brice >> >>> Can you print the pattern before and after thread 1 touched its pages, or >>> even in the middle ? >>> It looks like somebody is touching too many pages here. >> Experimenting with different threads touching one or more pages, I >> get unpredicatable results >> >> here on the 8 numa node device, the result is perfect. I am only >> allowing thread 3 and 7 to write a single memory location >> >> get_numa_domain() 8 Domain Numa pattern >> -------- >> -------- >> -------- >> 3------- >> -------- >> -------- >> -------- >> 7------- >> ============================ >> >> ============================ >> Contents of memory locations >> 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 >> 26 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 >> 63 0 0 0 0 0 0 0 >> ============================ >> >> you can see that core 26 (numa domain 3) wrote to memory, and so did >> core 63 (domain 8) >> >> Now I run it a second time and look, its rubbish >> >> get_numa_domain() 8 Domain Numa pattern >> 3------- >> 3------- >> 3------- >> 3------- >> 3------- >> 3------- >> 3------- >> 3------- >> ============================ >> >> ============================ >> Contents of memory locations >> 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 >> 26 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 >> 63 0 0 0 0 0 0 0 >> ============================ >> >> after allowing the data to be read by a random thread >> >> 37777777 >> 37777777 >> 37777777 >> 37777777 >> 37777777 >> 37777777 >> 37777777 >> 37777777 >> >> I'm baffled. >> >> JB >> >> _______________________________________________ >> hwloc-users mailing list >> hwloc-users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/hwloc-users _______________________________________________ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users