Thanks, I copied useful information from this thread and some links to https://github.com/open-mpi/hwloc/issues/143
However, not sure I'll have time to look at this in the near future :/ Brice Le 07/01/2016 09:03, Matthias Reich a écrit : > Hello, > > To check whether kstat is able to report the psrset definitions, I > defined a set consisting of 2 CPUs (psrset -c 1-2) CPU1 and CPU2. The > remaining CPUs (CPU0, CPU2..CPU23) were left undefined. > > On the machine, we can execute the "kstat" command and receive (among > 1000s of lines) the following info: > > module: unix instance: 0 > name: pset class: misc > avenrun_15min 70 > avenrun_1min 53 > avenrun_5min 47 > crtime 0 > ncpus 22 > runnable 1146912 > snaptime 80083.491239257 > updates 790784 > waiting 0 > > > module: unix instance: 1 > name: pset class: misc > avenrun_15min 0 > avenrun_1min 0 > avenrun_5min 0 > crtime 79983.070416351 > ncpus 2 > runnable 0 > snaptime 80083.595839172 > updates 1005 > waiting 0 > > which is not very comprehensive and doesn't even tell, which CPUs are > part of the particular set, but could probably be used to at least warn > about the existence of a CPU set and prevent the (not very intuitive) > error message and consequent abort. > > However, doing the same on the machine without the pset defined, we get: > > module: unix instance: 0 > name: pset class: misc > avenrun_15min 50 > avenrun_1min 38 > avenrun_5min 41 > crtime 0 > ncpus 24 > runnable 1163866 > snaptime 81105.346688035 > updates 801003 > waiting 0 > > so the (only) processor set encompasses all 24 (virtual) cores. This > may be the key to check for. > > The C-API to check for processor set(s) is available through the > libpool library, which allows more resource pool configuration than > just processor sets, but can probably act as an abstraction layer for > different Solaris flavors... > > Matthias > >> Hello >> So processor sets are not taken into account when Solaris reports >> topology information in kstat etc. >> Do you know if hwloc can query processor sets from the C interface? >> If so, we could apply the processor set mask to hwloc object cpusets >> during discovery to avoid your error. >> Brice >> >> Le 05/01/2016 10:18, Karl Behler a écrit : >>> There was a processor set defined (command psrset) on this machine. >>> Having removed the psrset hwloc-info produces a result without error >>> messages: >>> >>> hwloc-info -v >>> depth 0: 1 Machine (type #1) >>> depth 1: 2 NUMANode (type #2) >>> depth 2: 2 Package (type #3) >>> depth 3: 12 Core (type #5) >>> depth 4: 24 PU (type #6) >>> >>> It seems the concept of defining a psrset is in contradiction to what >>> hwloc and/or openmpi expects/allows. >>> >>> >>> On 04.01.16 18:16, Karl Behler wrote: >>>> We used to run our MPI application with the SUNWhpc implementation >>>> from Sun/Oracle. (This was derived from openmpi 1.5.) >>>> However, the Oracle HPC implementation fails for the new Solaris 11.3 >>>> platform. >>>> So we downloaded and made openmpi 1.10.1 on this platform from >>>> scratch. >>>> >>>> All seems fine and a simple test application runs fine. >>>> However, with the real application we are running into a hwloc >>>> problem. >>>> >>>> So we also downloaded and made the hwloc package 1.11.2. >>>> >>>> Now examining hardware locality we get the following error: >>>> >>>> hwloc-info -v --whole-io >>>> **************************************************************************** >>>> >>>> >>>> * hwloc 1.11.2 has encountered what looks like an error from the >>>> operating system. >>>> * >>>> * Core (P#0 cpuset 0x00001001) intersects with NUMANode (P#1 cpuset >>>> 0x0003c001) without inclusion! >>>> * Error occurred in topology.c line 1046 >>>> * >>>> * The following FAQ entry in the hwloc documentation may help: >>>> * What should I do when hwloc reports "operating system" warnings? >>>> * Otherwise please report this error message to the hwloc user's >>>> mailing list, >>>> * along with any relevant topology information from your platform. >>>> **************************************************************************** >>>> >>>> >>>> depth 0: 1 Machine (type #1) >>>> depth 1: 2 Package (type #3) >>>> depth 2: 2 NUMANode (type #2) >>>> depth 3: 1 Core (type #5) >>>> depth 4: 24 PU (type #6) >>>> >>>> Since I could not find the mentioned FAQ topic I'm asking the list >>>> for advice. >>>> >>>> Our system is an Oracle/ Solaris 11.3 (latest patch level) on an >>>> Intel hardware platform from Sun. >>>> >>>> output of uname -a -> SunOS sxaug28 5.11 11.3 i86pc i386 i86pc >>>> output of psrinfo -v -> >>>> >>>> Status of virtual processor 0 as of: 01/04/2016 17:10:17 >>>> on-line since 01/04/2016 14:44:28. >>>> The i386 processor operates at 1600 MHz, >>>> and has an i387 compatible floating point processor. >>>> Status of virtual processor 1 as of: 01/04/2016 17:10:17 >>>> on-line since 01/04/2016 14:45:10. >>>> The i386 processor operates at 1600 MHz, >>>> and has an i387 compatible floating point processor. >>>> . >>>> . (similar lines removed) >>>> . >>>> Status of virtual processor 23 as of: 01/04/2016 17:10:17 >>>> on-line since 01/04/2016 14:45:11. >>>> The i386 processor operates at 1600 MHz, >>>> and has an i387 compatible floating point processor. >>>> >>>> Following comes the script which was used to make hwloc: (used >>>> compiler: Sunstudio 12.4, see config.log as bz2 attachment) >>>> >>>> setenv CFLAGS "-m64 -xtarget=generic -xarch=sse2 -xprefetch >>>> -xprefetch_level=2 -xvector=simd -xdepend=yes -xbuiltin=%all -xO5" >>>> setenv CXXFLAGS "$CFLAGS" >>>> setenv FCFLAGS "-m64 -xtarget=generic -xarch=sse2 -xprefetch >>>> -xprefetch_level=2 -xvector=simd -stackvar -xO5" >>>> setenv FFLAGS "$FCFLAGS" >>>> setenv PREFIX /usr/openmpi/hwloc-1.11.2 >>>> ./configure --prefix=$PREFIX --disable-debug >>>> dmake -j 12 >>>> # as root: make install >>>> # : cp -p config.status $PREFIX/config.status >>>> >>>> Any advice much appreciated. >>>> >>>> Karl >>>> >>>> >>>> _______________________________________________ >>>> hwloc-users mailing list >>>> hwloc-users_at_[hidden] >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users >>>> Searchable archives: >>>> http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1236.php >>> >>> >>> -- >>> Dr. Karl Behler >>> CODAC & IT services ASDEX Upgrade >>> phon +49 89 3299-1351 fax 3299-961351 >>> >>> >>> >>> _______________________________________________ >>> hwloc-users mailing list >>> hwloc-users_at_[hidden] >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1236.php > > > > _______________________________________________ > hwloc-users mailing list > hwloc-us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > Link to this post: > http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1240.php