Hello,

To check whether kstat is able to report the psrset definitions, I
defined a set consisting of 2 CPUs (psrset -c 1-2) CPU1 and CPU2. The
remaining CPUs (CPU0, CPU2..CPU23) were left undefined.

On the machine, we can execute the "kstat" command and receive (among
1000s of lines) the following info:

module: unix                            instance: 0
name:   pset                            class:    misc
        avenrun_15min                   70
        avenrun_1min                    53
        avenrun_5min                    47
        crtime                          0
        ncpus                           22
        runnable                        1146912
        snaptime                        80083.491239257
        updates                         790784
        waiting                         0


module: unix                            instance: 1
name:   pset                            class:    misc
        avenrun_15min                   0
        avenrun_1min                    0
        avenrun_5min                    0
        crtime                          79983.070416351
        ncpus                           2
        runnable                        0
        snaptime                        80083.595839172
        updates                         1005
        waiting                         0

which is not very comprehensive and doesn't even tell, which CPUs are
part of the particular set, but could probably be used to at least warn
about the existence of a CPU set and prevent the (not very intuitive)
error message and consequent abort.

However, doing the same on the machine without the pset defined, we get:

module: unix                            instance: 0
name:   pset                            class:    misc
        avenrun_15min                   50
        avenrun_1min                    38
        avenrun_5min                    41
        crtime                          0
        ncpus                           24
        runnable                        1163866
        snaptime                        81105.346688035
        updates                         801003
        waiting                         0

so the (only) processor set encompasses all 24 (virtual) cores. This may be the key to check for.

The C-API to check for processor set(s) is available through the libpool library, which allows more resource pool configuration than just processor sets, but can probably act as an abstraction layer for
different Solaris flavors...

Matthias

 Hello
So processor sets are not taken into account when Solaris reports
topology information in kstat etc.
Do you know if hwloc can query processor sets from the C interface?
If so, we could apply the processor set mask to hwloc object cpusets
during discovery to avoid your error.
Brice

Le 05/01/2016 10:18, Karl Behler a écrit :
There was a processor set defined (command psrset) on this machine.
Having removed the psrset hwloc-info produces a result without error
messages:

hwloc-info -v
depth 0:        1 Machine (type #1)
 depth 1:       2 NUMANode (type #2)
  depth 2:      2 Package (type #3)
   depth 3:     12 Core (type #5)
    depth 4:    24 PU (type #6)

It seems the concept of defining a psrset is in contradiction to what
hwloc and/or openmpi expects/allows.


On 04.01.16 18:16, Karl Behler wrote:
We used to run our MPI application with the SUNWhpc implementation
from Sun/Oracle. (This was derived from openmpi 1.5.)
However, the Oracle HPC implementation fails for the new Solaris 11.3
platform.
So we downloaded and made openmpi 1.10.1 on this platform from scratch.

All seems fine and a simple test application runs fine.
However, with the real application we are running into a hwloc problem.

So we also downloaded and made the hwloc package 1.11.2.

Now examining hardware locality we get the following error:

hwloc-info -v --whole-io
****************************************************************************

* hwloc 1.11.2 has encountered what looks like an error from the
operating system.
*
* Core (P#0 cpuset 0x00001001) intersects with NUMANode (P#1 cpuset
0x0003c001) without inclusion!
* Error occurred in topology.c line 1046
*
* The following FAQ entry in the hwloc documentation may help:
*   What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's
mailing list,
* along with any relevant topology information from your platform.
****************************************************************************

depth 0:        1 Machine (type #1)
 depth 1:       2 Package (type #3)
  depth 2:      2 NUMANode (type #2)
   depth 3:     1 Core (type #5)
    depth 4:    24 PU (type #6)

Since I could not find the mentioned FAQ topic I'm asking the list
for advice.

Our system is an Oracle/ Solaris 11.3 (latest patch level) on an
Intel hardware platform from Sun.

output of uname -a -> SunOS sxaug28 5.11 11.3 i86pc i386 i86pc
output of psrinfo -v ->

Status of virtual processor 0 as of: 01/04/2016 17:10:17
  on-line since 01/04/2016 14:44:28.
  The i386 processor operates at 1600 MHz,
        and has an i387 compatible floating point processor.
Status of virtual processor 1 as of: 01/04/2016 17:10:17
  on-line since 01/04/2016 14:45:10.
  The i386 processor operates at 1600 MHz,
        and has an i387 compatible floating point processor.
.
. (similar lines removed)
.
Status of virtual processor 23 as of: 01/04/2016 17:10:17
  on-line since 01/04/2016 14:45:11.
  The i386 processor operates at 1600 MHz,
        and has an i387 compatible floating point processor.

Following comes the script which was used to make hwloc: (used
compiler: Sunstudio 12.4, see config.log as bz2 attachment)

setenv CFLAGS "-m64 -xtarget=generic -xarch=sse2 -xprefetch
-xprefetch_level=2 -xvector=simd -xdepend=yes -xbuiltin=%all -xO5"
setenv CXXFLAGS "$CFLAGS"
setenv FCFLAGS "-m64 -xtarget=generic -xarch=sse2 -xprefetch
-xprefetch_level=2 -xvector=simd -stackvar -xO5"
setenv FFLAGS "$FCFLAGS"
setenv PREFIX /usr/openmpi/hwloc-1.11.2
./configure --prefix=$PREFIX --disable-debug
dmake -j 12
# as root: make install
#        : cp -p config.status $PREFIX/config.status

Any advice much appreciated.

Karl


_______________________________________________
hwloc-users mailing list
hwloc-users_at_[hidden]
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
Searchable archives: 
http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1236.php


--
Dr. Karl Behler 
CODAC & IT services ASDEX Upgrade
phon +49 89 3299-1351 fax 3299-961351



_______________________________________________
hwloc-users mailing list
hwloc-users_at_[hidden]
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
Link to this post: 
http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1236.php



Reply via email to