Do you have both mpich and openmpi installed ?

yes

can you give the results of "dpkg -l | grep mpich " and ditto for openmpi?

root@capitanata:~# dpkg -l | grep mpich
ii  libmpich-dev:amd64                                               
3.3~b2-7+b1                                             amd64        
Development files for MPICH
ii  libmpich12:amd64                                                 
3.3~b2-7+b1                                             amd64        Shared 
libraries for MPICH
ii  mpich                                                            
3.3~b2-7+b1                                             amd64        
Implementation of the MPI Message Passing Interface standard

root@capitanata:~# dpkg -l | grep openmpi
ii  gromacs-openmpi                                                  2018.2-2   
                                             amd64        Molecular dynamics 
sim, binaries for OpenMPI parallelization
ii  libhdf5-openmpi-100:amd64                                        
1.10.0-patch1+docs-4+b2                                 amd64        
Hierarchical Data Format 5 (HDF5) - runtime files - OpenMPI version
ii  libhdf5-openmpi-dev                                              
1.10.0-patch1+docs-4+b2                                 amd64        
Hierarchical Data Format 5 (HDF5) - development files - OpenMPI version
ii  libmkl-blacs-openmpi-ilp64:amd64                                 
2018.3.222-1                                            amd64        IntelĀ® MKL 
: ILP64 version of BLACS routines for Open MPI
ii  libmkl-blacs-openmpi-lp64:amd64                                  
2018.3.222-1                                            amd64        IntelĀ® MKL 
: LP64 version of BLACS routines for Open MPI
ii  libopenmpi-dev:amd64                                             
3.1.1.real-4+b1                                         amd64        high 
performance message passing library -- header files
ii  libopenmpi3:amd64                                                
3.1.1.real-4+b1                                         amd64        high 
performance message passing library -- shared library
ii  libscalapack-openmpi-dev                                         2.0.2-7+b1 
                                             amd64        Scalable Linear 
Algebra Package - Dev files for OpenMPI
ii  libscalapack-openmpi2.0                                          2.0.2-7+b1 
                                             amd64        Scalable Linear 
Algebra Package - Shared libs for OpenMPI
ii  mpqc-openmpi                                                     2.3.1-18   
                                             all          Massively Parallel 
Quantum Chemistry Program (OpenMPI transitional package)
ii  openmpi-bin                                                      
3.1.1.real-4+b1                                         amd64        high 
performance message passing library -- binaries
ii  openmpi-common                                                   
3.1.1.real-4+b1                                         amd64        high 
performance message passing library -- common files
ii  openmpi-doc                                                      
3.1.1.real-4                                            all          high 
performance message passing library -- man pages
ii  yorick-mpy-openmpi                                               
2.2.04+dfsg1-9+b1                                       amd64        Message 
Passing Yorick (OpenMPI build)


The "alternatives" system may be confused. Check where the symlinks for /usr/bin/mpiexec, mpirun lead.

This seems ok, apparently:

root@capitanata:~# ls -l /etc/alternatives/mpirun
lrwxrwxrwx 1 root root 23 apr 21 17:09 /etc/alternatives/mpirun -> 
/usr/bin/mpirun.openmpi
root@capitanata:~# ls -l /etc/alternatives/mpiexec
lrwxrwxrwx 1 root root 24 apr 21 17:09 /etc/alternatives/mpiexec -> 
/usr/bin/mpiexec.openmpi


Try testing with mpiexec.openmpi explicitly rather than mpiexec.

I had done it already, and anyway mpiexec points to mpiexec.openmpi. No
change.

For the transport, try:

$ mpirun.openmpi -n 2 --mca btl self,tcp ./printf

yay! this worked. My bare bones test code with that runs flawlessly:

gmulas@capitanata:~/PAHmodels/anharmonica-scalapack$ mpiexec.openmpi ---mca btl 
self,tcp sample_printf
MPI_Init call ok
My rank is = 0
number of procs is = 2

MPI_Init call ok
My rank is = 1
number of procs is = 2

MPI_Finalize call ok, returned 0
MPI_Finalize call ok, returned 0


The same code, run without the --mca option, yields:

gmulas@capitanata:~/PAHmodels/anharmonica-scalapack$ mpiexec.openmpi -n 2 
sample_printf
--------------------------------------------------------------------------
[[23445,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: capitanata

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------

and then hangs there forever.

Now, I think it has _always_ been the default of openmpi to try to use
infiniband first, if available, and then fall back on slower tcp. The
question is: why does it apparently hang on trying to use some faster
interconnection instead of gracefully faling and moving on to the next
available slower one, as it did before?
Second question, a practical one: how should I configure mpiexec.openmpi so
that it uses self, tcp by default directly, when called without arguments?
This would at least make openmpi usable (with some configuration) and demote
the bug from grave to important or even normal, perhaps putting some info
about this problem and how to deal with it in a README.debian file.
Of course it's a workaround, not a real solution, but way better than
nothing :)

thanks!

Giacomo


--
_________________________________________________________________

Giacomo Mulas <giacomo.mu...@inaf.it>
_________________________________________________________________

INAF - Osservatorio Astronomico di Cagliari
via della scienza 5 - 09047 Selargius (CA)

tel.   +39 070 71180255
mob. : +39 329  6603810
_________________________________________________________________

"When the storms are raging around you, stay right where you are"
                         (Freddy Mercury)
_________________________________________________________________

Reply via email to