Hi Ralph,
this is a follow-up on Siegmar's post that started at
https://www.mail-archive.com/users@lists.open-mpi.org/msg31177.html
mpiexec -np 3 --host loki:2,exin hello_1_mpi
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 3 slots
that were requested by the application:
hello_1_mpi
Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
loki is a physical machine with 2 NUMA, 2 sockets, ...
*but* exin is a virtual machine with *no* NUMA, 2 sockets, ...
my guess is that mpirun is able to find some NUMA objects on 'loki', so
it uses the default mapping policy
(aka --map-by numa). unfortunatly exin has no NUMA objects, and mpirun
fails with an error message
that is hard to interpret.
as a workaround, it is possible to
mpirun --map-by socket
so if i understand and remember correctly, mpirun should make the
decision to map by numa *after* it receives the topology from exin and
not before.
does that make sense ?
can you please take care of that ?
fwiw, i ran
lstopo --of xml > /tmp/topo.xml
on two nodes, and manually remove the NUMANode and Bridge objects from
the topology of the second node, and then
mpirun --mca --mca hwloc_base_topo_file /tmp/topo.xml --host n0:2,n1 -np
3 hostname
in order to reproduce the issue.
Cheers,
Gilles
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel