Dear list,

I have a program that utilizes Openmpi + multithreading and I want the
freedom to decide on which hardware cores my threads should run. By using
hwloc_set_cpubind() that already works, so now I also want to bind memory
to the hardware cores. But I just can't get it to work.

Basically, I wrote the memory binding into my allocator, so the memory will
be allocated and then bound. I use hwloc 2.4.1, run the code on a Linux
system and I did check with “hwloc-info --support” if
hwloc_set_area_membind() and hwloc_get_area_membind() are supported and
they are.

Here is a snippet of my code, which runs through without any error. But the
hwloc_get_area_membind() always returns that all memory is bound to PU 0,
when I think it should be bound to different PUs. Am I missing something?

T* allocate(size_t n, hwloc_topology_t topology, int rank)
  // allocate memory
  T* t = (T*)hwloc_alloc(topology, sizeof(T) * n);
  // elements perthread
  size_t ept = 1024;
  hwloc_bitmap_t set;
  size_t offset = 0;
  size_t threadcount= 4;

  set = hwloc_bitmap_alloc();
  if(!set) {
    fprintf(stderr, "failed to allocate a bitmap\n");
  // bind memory to every thread
  for(size_t i = 0;i < threadcount; i++)
    // logical indexof where to bind the memory
    auto logid = (i +rank * threadcount) * 2;
    auto logobj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_PU, logid);
    hwloc_bitmap_only(set, logobj->os_index);
    //set the memory binding
    // I use HWLOC_MEMBIND_BIND as policy so I do not have to touch the
memory first to allocate it
    auto err = hwloc_set_area_membind(topology, t + offset, sizeof(T) *ept,
    if(err < 0)
      std::cout << "Error: memory binding failed" <<std::endl;

    // print out data of first set
    auto ii =hwloc_bitmap_first(set);
    auto obj = hwloc_get_pu_obj_by_os_index(topology, ii);
    std::cout << "Rank=" << rank << " Tid=" << i << " on PU logical index="
<<obj->logical_index << " (OS/physical index " <<ii << ")" << std::endl;

  // checking if memory is bound to the correct hardware core
    hwloc_membind_policy_t policy;
    err = hwloc_get_area_membind(topology, t + offset, sizeof(T) * ept,set,
    if(err < 0)
      std::cout << "Error: getting memory binding failed"<< std::endl;

    // print out data of hwloc_get_area_membind()
    ii= hwloc_bitmap_first(set);
    obj = hwloc_get_pu_obj_by_os_index(topology, ii);
    std::cout << "Rank=" << rank << " Tid=" << i << " actually on PU
logical index=" << obj->logical_index << " (OS/physical index " <<ii << ")"
<< std::endl;

    // increase memory offset
    offset += ept;
  return t;

Something that might be unrelated, but I still wanted to ask: from chapter
6 of the documentation I gather that a sort of best practice for binding
threads and memory is to first allocate the memory, then binding the memory
and finally doing the CPU binding. Am I correct in assuming this?

Thanks for taking the time to look it over

hwloc-users mailing list

Reply via email to