‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Monday, 27 January 2020 11:54, Todor Kondić <tk.c...@protonmail.com> wrote:
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Sunday, 19 January 2020 11:25, Todor Kondić tk.c...@protonmail.com wrote: > > > I am getting mpirun errors when trying to execute a simple > > mpirun -np 1 program > > (where program is e.g. 'ls') command in a container environment. > > The error is usually: > > All nodes which are allocated for this job are already filled. > > which makes no sense, as I am trying this on my workstation (single socket, > > four cores -- your off-the-shelf i5 cpu) and no scheduling system enabled. > > I set up the container with this command: > > guix environment -C -N --ad-hoc -m default.scm > > where default.scm: > > (use-modules (guix packages)) > > (specifications->manifest > > `(;; Utilities > > "less" > > "bash" > > "make" > > "openssh" > > "guile" > > "nano" > > "glibc-locales" > > "gcc-toolchain@7.4.0" > > "gfortran-toolchain@7.4.0" > > "python" > > "openmpi" > > "fftw" > > "fftw-openmpi" > > ,@(map (lambda (x) (package-name x)) %base-packages))) > > Simply installing openmpi (guix package -i openmpi) in my usual Guix > > profile just works out of the box. So, there has to be some quirk where the > > openmpi container installation is blind to some settings within the usual > > environment. > > For the environment above, > > if the mpirun invocation is changed to provide the hostname > > mpirun --host $HOSTNAME:4 -np 4 ls > > ls is executed in four processes and the output is four times the contents of > the current directory as expected. > > Of course, ls is not an MPI program. However, testing this elementary fortran > MPI code, > > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > program testrun2 > use mpi > implicit none > integer :: ierr > > call mpi_init(ierr) > call mpi_finalize(ierr) > > end program testrun2 > > -------------------------------------------------------------------------------------------------------------------------- > > fails with runtime errors on any number of processes. > > The compilation line was: > mpif90 test2.f90 -o testrun2 > > The mpirun command: > mpirun --host $HOSTNAME:4 -np 4 > > Let me reiterate, there is no need to declare the host and its maximal number > of slots in the normal user environment. Also, the runtime errors are gone. > > Could it be that the openmpi package needs a few other basic dependencies not > present in the package declaration for the particular case of a single node > (normal PC) machine? > > Also, I noted that gfortran/mpif90 ignores "CPATH" and "LIBRARY_PATH" env > variables. I had to specify this explicitly via -I and -L flags to the > compiler. After playing around a bit more, I can confirm that pure guix environment does works. Therefore, my solution is to get rid of -C flag and use --pure when developing and testing the MPI code on my workstation. Of course, it would be interesting to find out why OpenMPI stops working inside the "-C" environment. The closest problem solved on the net that I could find was about the friction between the new vader shared memory module of the OpenMPI and Docker containers (https://github.com/open-mpi/ompi/issues/4948). The recommended circumvention technique did not work, but it feels related.