This is on a Solaris 11 system with hwloc 1.6.1:
% lstopo-no-graphics
Machine (4095MB) + NUMANode L#0 (P#0 4095MB) + Socket L#0
Core L#0 + PU L#0 (P#0)
Core L#1 + PU L#1 (P#1)
Core L#2 + PU L#2 (P#2)
Core L#3 + PU L#3 (P#3)
% hwloc-bind socket:0.pu:1 hwloc-bind --get
0x0000000f
I assume that output is wrong. I bind to a single core, but the returned mask
shows binding to all four cores.
To confirm that binding is indeed happening and that it's the reporting that's
incorrect:
% hwloc-bind socket:0.pu:0 pbind -q
process id 1773: 0
% hwloc-bind socket:0.pu:1 pbind -q
process id 1774: 1
% hwloc-bind socket:0.pu:2 pbind -q
process id 1775: 2
% hwloc-bind socket:0.pu:3 pbind -q
process id 1776: 3
It seems to me the problem is in topology-solaris.c. In hwloc_solaris_set_sth_cpubind(), we can bind to a single core with
processor_bind(), which is what's happening in our case. Then, in hwloc_solaris_get_sth_cpubind(), we check for lgroup affinity but
not for any processor_bind() binding. So, we assume we're not bound.
How about adding a check upon entry to hwloc_solaris_get_sth_cpubind(): if processor_bind() shows binding, report this and be done.
If not, then continue on with the lgroup logic that's already in that function. Yes?