I was trying to use hwloc-bind this morning, and I was a bit confused by the
syntax. I see that the help message says:
-----
Usage: topobind [options] <location> -- command ...
<location> may be a space-separated list of cpusets or objects
as supported by the hwloc-mask utility.
-----
(shouldn't that say hwloc-bind, not topobind?)
I actually think that hwloc-bind might be used more frequently than hwloc-mask
(the help message / man page of which doesn't explain at all what this program
does!). So we should put the <location> explanations in the help message / man
page for hwloc-bind.
So looking in the --help message of hwloc-mask, I see the following:
-----
<string> may be <depth:index>
- <depth> may be system, machine, node, socket, core, proc or a numeric depth
- <index> may be
. X one object with index X
. X-Y all objects with index between X and Y
. X- all objects with index at least X
. X:N N objects starting with index X, possibly wrapping-around the
end of the level
. all all objects
. odd all objects with odd index
. even all objects with even index
- several <depth:index> may be concatenated with `.' to select some specific
children
<string> may also be a cpuset string
-----
I assume the <string> here in hwloc-mask is the same as the <location> in
hwloc-bind.
Questions:
1. Is the index syntax "X,Y[,Z[...]]" supported? I don't see it on the list,
but was curious if it is supported anyway. E.g., "proc:0,1,4". That would
seem useful (slightly shorter than "proc:0.proc:1.proc:4"). I can file a
feature request if it's not already supported.
2. What does it mean to "hwloc-bind core:0 ..."? (I asked Samuel this in IM as
well, but I didn't understand his answer). *Which* "core 0" does that refer
to? For example, an abbreviated version of my lstopo output is as follows
(it's a pre-production EX machine -- I can't share all the details -- I 'x'ed
out some of the numerical values):
-----
System(xxxGB)
Node#0(xxxGB) + Socket#0 + L3(xxxMB)
L2(xxxKB) + L1(xxxKB) + Core#0 + P#0
...
Node#1(xxxGB) + Socket#2 + L3(xxxMB)
L2(xxxKB) + L1(xxxKB) + Core#0 + P#1
...
-----
The processors have unique numbers, but the cores do not. Is that a bug?
3. What is the difference between "system" and "machine"?
4. What exactly does "index" refer to -- is it a virtual index (e.g., hwloc's
numbering of 0-N) or is it the OS's index? I thought we used OS index
numbering, but #2 confuses me -- if #2 is just a bug, then perhaps this
question is moot. :-)
5. What exactly is a "cpuset string"? Can some examples be provided?
--> Sidenote: I actually find hwloc's use of the word "cpuset" to be quite
confusing because it is *NOT* the same as an OS cpuset. Is there any
possibility that we could choose another word for this hwloc concept for v1.0?
If we don't, it seems like we'll continually be explaining this to people who
don't read (or forget) the "glossary" section in the docs.
6. "several <depth:index> may be concatenated with `.'..." Does that mean that
this is legal:
core:0.node:2.system:4
If so, what exactly does it mean when they overlap? Is it simply the union of
those 3 specifications? Also, I'm curious -- why was a period chosen as the
delimiter instead of a comma? Is this a Europe-vs-US thing? (i.e., in the US,
we typically use commas for lists -- is it different in Europe?)
Note that a comma list delimiter gets a little iffy if #1 becomes supported --
then a comma would be a delimiter for both multiple indexes and depths. Hrm.
--
Jeff Squyres
[email protected]