Dear Brice,

I am not sure why this is happening since all code seems to be using the same 
hwloc library version (1.8) but it does :) An MPI program is started through 
SLURM on two nodes with four CPU cores total (divided over the nodes) using the 
following script:

#! /bin/bash
#SBATCH -N 2 -n 4
/usr/bin/mpiexec /usr/bin/lstopo --version
/usr/bin/mpiexec /usr/bin/lstopo --of xml
/usr/bin/mpiexec  /path/to/my_mpi_code

When this is submitted multiple times it gives “out-of-order” warnings in about 
9/10 cases but works without warnings in 1/10 cases. I attached the output 
(with xml) for both the working and `broken` case. Note that the xml is of 
course printed (differently) multiple times for each task/core. As always, any 
help would be appreciated.

Regards,

Pim Schellart

P.S. $ mpirun --version
mpirun (Open MPI) 1.6.5

Attachment: broken.log
Description: Binary data

Attachment: working.log
Description: Binary data


> On 07 Dec 2014, at 13:50, Brice Goglin <brice.gog...@inria.fr> wrote:
> 
> Hello
> The github issue you're refering to was closed 18 months ago. The
> warning (it's not an error) is only supposed to appear if you're
> importing in a recent hwloc a XML that was exported from a old hwloc. I
> don't see how that could happen when using Open MPI since the hwloc
> versions on both sides is the same.
> Make sure you're not confusing with another error described here
> 
> http://www.open-mpi.org/projects/hwloc/doc/v1.10.0/a00028.php#faq_os_error
> Otherwise please report the exact Open MPI and/or hwloc versions as well
> as the XML lstopo output on the nodes that raise the warning (lstopo
> foo.xml). Send these to hwloc mailing lists such as
> hwloc-us...@open-mpi.org or hwloc-de...@open-mpi.org
> Thanks
> Brice
> 
> 
> Le 07/12/2014 13:29, Pim Schellart a écrit :
>> Dear OpenMPI developers,
>> 
>> this might be a bit off topic but when using the SLURM scheduler (with 
>> cpuset support) on Ubuntu 14.04 (openmpi 1.6) hwloc sometimes gives a 
>> "out-of-order topology discovery” error. According to issue #103 on github 
>> (https://github.com/open-mpi/hwloc/issues/103) this error was discussed 
>> before and it was possible to sort it out in “insert_object_by_parent”, is 
>> this still considered? If not, what (top level) hwloc API call should we 
>> look for in the SLURM sources to start debugging? Any help will be most 
>> welcome.
>> 
>> Kind regards,
>> 
>> Pim Schellart
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/12/16441.php
> 

Reply via email to